# Sergei Vassilvitskii

## Contact Details

NameSergei Vassilvitskii |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Computer Science and Game Theory (5) Computer Science - Data Structures and Algorithms (4) Computer Science - Databases (2) Computer Science - Information Retrieval (1) Computer Science - Computational Engineering; Finance; and Science (1) Computer Science - Multiagent Systems (1) Computer Science - Information Theory (1) Computer Science - Learning (1) Computer Science - Computational Geometry (1) Mathematics - Information Theory (1) |

## Publications Authored By Sergei Vassilvitskii

We study the cost sharing problem for cooperative games in situations where the cost function $C$ is not available via oracle queries, but must instead be derived from data, represented as tuples $(S, C(S))$, for different subsets $S$ of players. We formalize this approach, which we call statistical cost sharing, and consider the computation of the core and the Shapley value, when the tuples are drawn from some distribution $\mathcal{D}$. Previous work by Balcan et al. Read More

Maximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this question in the context of data streams, where elements arrive one at a time, and we want to design low-memory and fast update-time algorithms that maintain a good solution. Specifically, we focus on the sliding window model, where we are asked to maintain a solution that considers only the last $W$ items. Read More

We study the question of setting and testing reserve prices in single item auctions when the bidders are not identical. At a high level, there are two generalizations of the standard second price auction: in the lazy version we first determine the winner, and then apply reserve prices; in the eager version we first discard the bidders not meeting their reserves, and then determine the winner among the rest. We show that the two versions have dramatically different properties: lazy reserves are easy to optimize, and A/B test in production, whereas eager reserves always lead to higher welfare, but their optimization is NP-complete, and naive A/B testing will lead to incorrect conclusions. Read More

Information distances like the Hellinger distance and the Jensen-Shannon divergence have deep roots in information theory and machine learning. They are used extensively in data analysis especially when the objects being compared are high dimensional empirical probability distributions built from data. However, we lack common tools needed to actually use information distances in applications efficiently and at scale with any kind of provable guarantees. Read More

We undertake a formal study of the value of targeting data to an advertiser. As expected, this value is increasing in the utility difference between realizations of the targeting data and the accuracy of the data, and depends on the distribution of competing bids. However, this value may vary non-monotonically with an advertiser's budget. Read More

Over half a century old and showing no signs of aging, k-means remains one of the most popular data processing algorithms. As is well-known, a proper initialization of k-means is crucial for obtaining a good final solution. The recently proposed k-means++ initialization algorithm achieves this, obtaining an initial set of centers that is provably close to the optimum solution. Read More

Motivated by the problem of optimizing allocation in guaranteed display advertising, we develop an efficient, lightweight method of generating a compact {\em allocation plan} that can be used to guide ad server decisions. The plan itself uses just O(1) state per guaranteed contract, is robust to noise, and allows us to serve (provably) nearly optimally. The optimization method we develop is scalable, with a small in-memory footprint, and working in linear time per iteration. Read More

A large fraction of online display advertising is sold via guaranteed contracts: a publisher guarantees to the advertiser a certain number of user visits satisfying the targeting predicates of the contract. The publisher is then tasked with solving the ad serving problem - given a user visit, which of the thousands of matching contracts should be displayed, so that by the expiration time every contract has obtained the requisite number of user visits. The challenges of the problem come from (1) the sheer size of the problem being solved, with tens of thousands of contracts and billions of user visits, (2) the unpredictability of user behavior, since these contracts are sold months ahead of time, when only a forecast of user visits is available and (3) the minute amount of resources available online, as an ad server must respond with a matching contract in a fraction of a second. Read More

The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any epsilon>0, our algorithms make O((log n)/log (1+epsilon)) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1+epsilon) of the optimum. Read More

Many large-scale Web applications that require ranked top-k retrieval such as Web search and online advertising are implemented using inverted indices. An inverted index represents a sparse term-document matrix, where non-zero elements indicate the strength of term-document association. In this work, we present an approach for lossless compression of inverted indices. Read More

We discuss a multi-objective/goal programming model for the allocation of inventory of graphical advertisements. The model considers two types of campaigns: guaranteed delivery (GD), which are sold months in advance, and non-guaranteed delivery (NGD), which are sold using real-time auctions. We investigate various advertiser and publisher objectives such as (a) revenue from the sale of impressions, clicks and conversions, (b) future revenue from the sale of NGD inventory, and (c) "fairness" of allocation. Read More

Display advertising has traditionally been sold via guaranteed contracts -- a guaranteed contract is a deal between a publisher and an advertiser to allocate a certain number of impressions over a certain period, for a pre-specified price per impression. However, as spot markets for display ads, such as the RightMedia Exchange, have grown in prominence, the selection of advertisements to show on a given page is increasingly being chosen based on price, using an auction. As the number of participants in the exchange grows, the price of an impressions becomes a signal of its value. Read More

For most people, social contacts play an integral part in finding a new job. As observed by Granovetter's seminal study, the proportion of jobs obtained through social contacts is usually large compared to those obtained through postings or agencies. At the same time, job markets are a natural example of two-sided matching markets. Read More