Cognitive Inference of Demographic Data by User Ratings

Cognitive inference of user demographics, such as gender and age, plays an important role in creating user profiles for adjusting marketing strategies and generating personalized recommendations because user demographic data is usually not available due to data privacy concerns. At present, users can readily express feedback regarding products or services that they have purchased. During this process, user demographics are concealed, but the data has never yet been successfully utilized to contribute to the cognitive inference of user demographics. In this paper, we investigate the inference power of user ratings data, and propose a simple yet general cognitive inference model, called rating to profile (R2P), to infer user demographics from user provided ratings. In particular, the proposed R2P model can achieve the following: 1. Correctly integrate user ratings into model training. 2.Infer multiple demographic attributes of users simultaneously, capturing the underlying relevance between different demographic attributes. 3. Train its two components, i.e. feature extractor and classifier, in an integrated manner under a supervised learning paradigm, which effectively helps to discover useful hidden patterns from highly sparse ratings data. We introduce how to incorporate user ratings data into the research field of cognitive inference of user demographic data, and detail the model development and optimization process for the proposed R2P. Extensive experiments are conducted on two real-world ratings datasets against various compared state-of-the-art methods, and the results from multiple aspects demonstrate that our proposed R2P model can significantly improve on the cognitive inference performance of user demographic data.

Comments: This paper has been withdrawn by the author due to a crucial sign error in some equations and figures

Similar Publications

We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model. While citations have been previously used in generating scientific summaries, they lack the related context from the referenced article and therefore do not accurately reflect the article's content. Our method overcomes the problem of inconsistency between the citation summary and the article's content by providing context for each citation. Read More


Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g. Read More


Recommender systems nowadays have many applications and are of great economic benefit. Hence, it is imperative for success-oriented companies to compare different of such systems and select the better one for their purposes. To this end, various metrics of predictive accuracy are commonly used, such as the Root Mean Square Error (RMSE), or precision and recall. Read More


Duplication, whether exact or partial, is a common issue in many datasets. In clinical notes data, duplication (and near duplication) can arise for many reasons, such as the pervasive use of templates, copy-pasting, or notes being generated by automated procedures. A key challenge in removing such near duplicates is the size of such datasets; our own dataset consists of more than 10 million notes. Read More


Due to its promise to alleviate information overload, text summarization has attracted the attention of many researchers. However, it has remained a serious challenge. Here, we first prove empirical limits on the recall (and F1-scores) of extractive summarizers on the DUC datasets under ROUGE evaluation for both the single-document and multi-document summarization tasks. Read More


In this paper, we propose a novel approach for aggregating online reviews, according to the opinions they express. Our methodology is unsupervised - due to the fact that it does not rely on pre-labeled reviews - and it is agnostic - since it does not make any assumption about the domain or the language of the review content. We measure the adherence of a review content to the domain terminology extracted from a review set. Read More


This paper presents the approach developed at the Faculty of Engineering of University of Porto, to participate in SemEval 2017, Task 5: Fine-grained Sentiment Analysis on Financial Microblogs and News. The task consisted in predicting a real continuous variable from -1.0 to +1. Read More


In this paper we propose the creation of generic LSH families for the angular distance based on Johnson-Lindenstrauss projections. We show that feature hashing is a valid J-L projection and propose two new LSH families based on feature hashing. These new LSH families are tested on both synthetic and real datasets with very good results and a considerable performance improvement over other LSH families. Read More


The task of next POI recommendation has been studied extensively in recent years. However, developing an unified recommendation framework to incorporate multiple factors associated with both POIs and users remains challenging, because of the heterogeneity nature of these information. Further, effective mechanisms to handle cold-start and endow the system with interpretability are also difficult topics. Read More


Search engines play an important role in our everyday lives by assisting us in finding the information we need. When we input a complex query, however, results are often far from satisfactory. In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned. Read More