Cognitive Inference of Demographic Data by User Ratings

Cognitive inference of user demographics, such as gender and age, plays an important role in creating user profiles for adjusting marketing strategies and generating personalized recommendations because user demographic data is usually not available due to data privacy concerns. At present, users can readily express feedback regarding products or services that they have purchased. During this process, user demographics are concealed, but the data has never yet been successfully utilized to contribute to the cognitive inference of user demographics. In this paper, we investigate the inference power of user ratings data, and propose a simple yet general cognitive inference model, called rating to profile (R2P), to infer user demographics from user provided ratings. In particular, the proposed R2P model can achieve the following: 1. Correctly integrate user ratings into model training. 2.Infer multiple demographic attributes of users simultaneously, capturing the underlying relevance between different demographic attributes. 3. Train its two components, i.e. feature extractor and classifier, in an integrated manner under a supervised learning paradigm, which effectively helps to discover useful hidden patterns from highly sparse ratings data. We introduce how to incorporate user ratings data into the research field of cognitive inference of user demographic data, and detail the model development and optimization process for the proposed R2P. Extensive experiments are conducted on two real-world ratings datasets against various compared state-of-the-art methods, and the results from multiple aspects demonstrate that our proposed R2P model can significantly improve on the cognitive inference performance of user demographic data.

Comments: This paper has been withdrawn by the author due to a crucial sign error in some equations and figures

Similar Publications

Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform knowledge base completion, i.e. Read More

It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. Read More

Text content can have different visual presentation ways with roughly similar characters. While conventional text image retrieval depends on complex model of OCR-based text recognition and text similarity detection, this paper proposes a new learning-based approach to text image retrieval with the purpose of finding out the original or similar text through a query text image. Firstly, features of text images are extracted by the CNN network to obtain the deep visual representations. Read More

A typical IR system that delivers and stores information is affected by problem of matching between user query and available content on web. Use of Ontology represents the extracted terms in form of network graph consisting of nodes, edges, index terms etc. The above mentioned IR approaches provide relevance thus satisfying users query. Read More

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. Read More

Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success of previous approaches, they are, however, based on co-occurrence of items. Once there are no co-occurrence items available in documents, they will not work well. Read More

Learning an encoding of feature vectors in terms of an over-complete dictionary or a probabilistic information geometric (Fisher vectors) construct is wide-spread in statistical signal processing and computer vision. In content based information retrieval using deep-learning classifiers, such encodings are learnt on the flattened last layer, without adherence to the multi-linear structure of the underlying feature tensor. We illustrate a variety of feature encodings incl. Read More

Accurately evaluating new policies (e.g. ad-placement models, ranking functions, recommendation functions) is one of the key prerequisites for improving interactive systems. Read More

We present work on building a global long-tailed ranking of entities across multiple languages using Wikipedia and Freebase knowledge bases. We identify multiple features and build a model to rank entities using a ground-truth dataset of more than 10 thousand labels. The final system ranks 27 million entities with 75% precision and 48% F1 score. Read More