Revisiting the Examination Hypothesis with Query Specific Position Bias

Click through rates (CTR) offer useful user feedback that can be used to infer the relevance of search results for queries. However it is not very meaningful to look at the raw click through rate of a search result because the likelihood of a result being clicked depends not only on its relevance but also the position in which it is displayed. One model of the browsing behavior, the {\em Examination Hypothesis} \cite{RDR07,Craswell08,DP08}, states that each position has a certain probability of being examined and is then clicked based on the relevance of the search snippets. This is based on eye tracking studies \cite{Claypool01, GJG04} which suggest that users are less likely to view results in lower positions. Such a position dependent variation in the probability of examining a document is referred to as {\em position bias}. Our main observation in this study is that the position bias tends to differ with the kind of information the user is looking for. This makes the position bias {\em query specific}. In this study, we present a model for analyzing a query specific position bias from the click data and use these biases to derive position independent relevance values of search results. Our model is based on the assumption that for a given query, the positional click through rate of a document is proportional to the product of its relevance and a {\em query specific} position bias. We compare our model with the vanilla examination hypothesis model (EH) on a set of queries obtained from search logs of a commercial search engine. We also compare it with the User Browsing Model (UBM) \cite{DP08} which extends the cascade model of Craswell et al\cite{Craswell08} by incorporating multiple clicks in a query session. We show that the our model, although much simpler to implement, consistently outperforms both EH and UBM on well-used measures such as relative error and cross entropy.


Similar Publications

The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense knowledge over relevant aspects of the world, including useful visual information, e.g.: "a ball is used by a football player", "a tennis player is located at a tennis court". Read More


We study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. Read More


We describe the results of a qualitative study on journalists' information seeking behavior on social media. Based on interviews with eleven journalists along with a study of a set of university level journalism modules, we determined the categories of information need types that lead journalists to social media. We also determined the ways that social media is exploited as a tool to satisfy information needs and to define influential factors, which impacted on journalists' information seeking behavior. Read More


Automated music playlist generation is a specific form of music recommendation. Generally stated, the user receives a set of song suggestions defining a coherent listening session. We hypothesize that the best way to convey such playlist coherence to new recommendations is by learning it from actual curated examples, in contrast to imposing ad hoc constraints. Read More


The extraction of individual reference strings from the reference section of scientific publications is an important step in the citation extraction pipeline. Current approaches divide this task into two steps by first detecting the reference section areas and then grouping the text lines in such areas into reference strings. We propose a classification model that considers every line in a publication as a potential part of a reference string. Read More


Social media platforms contain a great wealth of information which provides opportunities for us to explore hidden patterns or unknown correlations, and understand people's satisfaction with what they are discussing. As one showcase, in this paper, we present a system, TwiInsight which explores the insight of Twitter data. Different from other Twitter analysis systems, TwiInsight automatically extracts the popular topics under different categories (e. Read More


Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions. To address this problem, we propose an unsupervised model that uses distributed representation of words as well as domain knowledge to extract the appropriate context from the reference paper. Evaluation results show the effectiveness of our model by significantly outperforming the state-of-the-art. Read More


Many learning-to-rank (LtR) algorithms focus on query-independent model, in which query and document do not lie in the same feature space, and the rankers rely on the feature ensemble about query-document pair instead of the similarity between query instance and documents. However, existing algorithms do not consider local structures in query-document feature space, and are fragile to irrelevant noise features. In this paper, we propose a novel Riemannian metric learning algorithm to capture the local structures and develop a robust LtR algorithm. Read More


Making personalized and context-aware suggestions of venues to the users is very crucial in venue recommendation. These suggestions are often based on matching the venues' features with the users' preferences, which can be collected from previously visited locations. In this paper we present a novel user-modeling approach which relies on a set of scoring functions for making personalized suggestions of venues based on venues content and reviews as well as users context. Read More


The vision of the Semantic Web (SW) is gradually unfolding and taking shape through a web of linked data, a part of which is built by capturing semantics stored in existing knowledge organization systems (KOS), subject metadata and resource metadata. The content of vast bibliographic collections is currently categorized by some widely used bibliographic classification and we may soon see them being mined for information and linked in a meaningful way across the Web. Bibliographic classifications are designed for knowledge mediation which offers both a rich terminology and different ways in which concepts can be categorized and related to each other in the universe of knowledge. Read More