# Kunal Talwar

## Contact Details

NameKunal Talwar |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Data Structures and Algorithms (17) Computer Science - Cryptography and Security (5) Computer Science - Learning (5) Computer Science - Computational Geometry (3) Statistics - Machine Learning (3) Computer Science - Discrete Mathematics (2) Mathematics - Combinatorics (2) Computer Science - Computer Science and Game Theory (2) Computer Science - Computational Complexity (2) Mathematics - Number Theory (1) Computer Science - Distributed; Parallel; and Cluster Computing (1) Computer Science - Artificial Intelligence (1) |

## Publications Authored By Kunal Talwar

The online (uniform) buy-at-bulk network design problem asks us to design a network, where the edge-costs exhibit economy-of-scale. Previous approaches to this problem used tree- embeddings, giving us randomized algorithms. Moreover, the optimal results with a logarithmic competitive ratio requires the metric on which the network is being built to be known up-front; the competitive ratios then depend on the size of this metric (which could be much larger than the number of terminals that arrive). Read More

Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE). Read More

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Read More

High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that any sparse polynomial function can be computed, on nearly all sparse binary vectors, by a single layer neural network that takes a compact sketch of the vector as input. Read More

**Authors:**Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. Read More

A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. Read More

Empirical Risk Minimization (ERM) is a standard technique in machine learning, where a model is selected by minimizing a loss function over constraint set. When the training dataset consists of private information, it is natural to use a differentially private ERM algorithm, and this problem has been the subject of a long line of work started with Chaudhuri and Monteleoni 2008. A private ERM algorithm outputs an approximate minimizer of the loss function and its error can be measured as the difference from the optimal value of the loss function. Read More

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning applications. Min-wise sampling can be used to derive an unbiased estimator for Jaccard similarity and taking a few hundred independent consistent samples leads to compact sketches which provide good estimates of pairwise-similarity. Subsequent works extended this technique to weighted sets and show how to produce samples with only a constant number of hash evaluations for any element, independent of its weight. Read More

The $\gamma_2$ norm of a real $m\times n$ matrix $A$ is the minimum number $t$ such that the column vectors of $A$ are contained in a $0$-centered ellipsoid $E\subseteq\mathbb{R}^m$ which in turn is contained in the hypercube $[-t, t]^m$. We prove that this classical quantity approximates the \emph{hereditary discrepancy} $\mathrm{herdisc}\ A$ as follows: $\gamma_2(A) = {O(\log m)}\cdot \mathrm{herdisc}\ A$ and $\mathrm{herdisc}\ A = O(\sqrt{\log m}\,)\cdot\gamma_2(A) $. Since $\gamma_2$ is polynomial-time computable, this gives a polynomial-time approximation algorithm for hereditary discrepancy. Read More

This paper is motivated by the fact that many systems need to be maintained continually while the underlying costs change over time. The challenge is to continually maintain near-optimal solutions to the underlying optimization problems, without creating too much churn in the solution itself. We model this as a multistage combinatorial optimization problem where the input is a sequence of cost functions (one for each time step); while we can change the solution from step to step, we incur an additional cost for every such change. Read More

The Discrepancy of a hypergraph is the minimum attainable value, over two-colorings of its vertices, of the maximum absolute imbalance of any hyperedge. The Hereditary Discrepancy of a hypergraph, defined as the maximum discrepancy of a restriction of the hypergraph to a subset of its vertices, is a measure of its complexity. Lovasz, Spencer and Vesztergombi (1986) related the natural extension of this quantity to matrices to rounding algorithms for linear programs, and gave a determinant based lower bound on the hereditary discrepancy. Read More

We prove that any graph excluding $K_r$ as a minor has can be partitioned into clusters of diameter at most $\Delta$ while removing at most $O(r/\Delta)$ fraction of the edges. This improves over the results of Fakcharoenphol and Talwar, who building on the work of Klein, Plotkin and Rao gave a partitioning that required to remove $O(r^2/\Delta)$ fraction of the edges. Our result is obtained by a new approach to relate the topological properties (excluding a minor) of a graph to its geometric properties (the induced shortest path metric). Read More

We provide a relatively simple proof that the expected gap between the maximum load and the average load in the two choice process is bounded by $(1+o(1))\log \log n$, irrespective of the number of balls thrown. The theorem was first proven by Berenbrink et al. Their proof uses heavy machinery from Markov-Chain theory and some of the calculations are done using computers. Read More

We show that the hereditary discrepancy of homogeneous arithmetic progressions is lower bounded by $n^{1/O(\log \log n)}$. This bound is tight up to the constant in the exponent. Our lower bound goes via proving an exponential lower bound on the discrepancy of set systems of subcubes of the boolean cube $\{0, 1\}^d$. Read More

Consider a database of $n$ people, each represented by a bit-string of length $d$ corresponding to the setting of $d$ binary attributes. A $k$-way marginal query is specified by a subset $S$ of $k$ attributes, and a $|S|$-dimensional binary vector $\beta$ specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to $S$ agrees with $\beta$. Read More

Consider the problem of partitioning an arbitrary metric space into pieces of diameter at most \Delta, such every pair of points is separated with relatively low probability. We propose a rate-based algorithm inspired by multiplicatively-weighted Voronoi diagrams, and prove it has optimal trade-offs. This also gives us another logarithmic approximation algorithm for the 0-extension problem. Read More

We give a 2-approximation algorithm for Non-Uniform Sparsest Cut that runs in time $n^{O(k)}$, where $k$ is the treewidth of the graph. This improves on the previous $2^{2^k}$-approximation in time $\poly(n) 2^{O(k)}$ due to Chlamt\'a\v{c} et al. To complement this algorithm, we show the following hardness results: If the Non-Uniform Sparsest Cut problem has a $\rho$-approximation for series-parallel graphs (where $\rho \geq 1$), then the Max Cut problem has an algorithm with approximation factor arbitrarily close to $1/\rho$. Read More

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. Read More

We advance the approach initiated by Chawla et al. for sanitizing (census) data so as to preserve the privacy of respondents while simultaneously extracting "useful" statistical information. First, we extend the scope of their techniques to a broad and rich class of distributions, specifically, mixtures of highdimensional balls, spheres, Gaussians, and other "nice" distributions. Read More

Given a capacitated graph $G = (V,E)$ and a set of terminals $K \subseteq V$, how should we produce a graph $H$ only on the terminals $K$ so that every (multicommodity) flow between the terminals in $G$ could be supported in $H$ with low congestion, and vice versa? (Such a graph $H$ is called a flow-sparsifier for $G$.) What if we want $H$ to be a "simple" graph? What if we allow $H$ to be a convex combination of simple graphs? Improving on results of Moitra [FOCS 2009] and Leighton and Moitra [STOC 2010], we give efficient algorithms for constructing: (a) a flow-sparsifier $H$ that maintains congestion up to a factor of $O(\log k/\log \log k)$, where $k = |K|$, (b) a convex combination of trees over the terminals $K$ that maintains congestion up to a factor of $O(\log k)$, and (c) for a planar graph $G$, a convex combination of planar graphs that maintains congestion up to a constant factor. This requires us to give a new algorithm for the 0-extension problem, the first one in which the preimages of each terminal are connected in $G$. Read More

In this paper we show how the complexity of performing nearest neighbor (NNS) search on a metric space is related to the expansion of the metric space. Given a metric space we look at the graph obtained by connecting every pair of points within a certain distance $r$ . We then look at various notions of expansion in this graph relating them to the cell probe complexity of NNS for randomized and deterministic, exact and approximate algorithms. Read More

Constrained submodular maximization problems have long been studied, with near-optimal results known under a variety of constraints when the submodular function is monotone. The case of non-monotone submodular maximization is less understood: the first approximation algorithms even for the unconstrainted setting were given by Feige et al. (FOCS '07). Read More

We consider the noise complexity of differentially private mechanisms in the setting where the user asks $d$ linear queries $f\colon\Rn\to\Re$ non-adaptively. Here, the database is represented by a vector in $\Rn$ and proximity between databases is measured in the $\ell_1$-metric. We show that the noise complexity is determined by two geometric parameters associated with the set of queries. Read More

Consider the following problem: given a metric space, some of whose points are "clients", open a set of at most $k$ facilities to minimize the average distance from the clients to these facilities. This is just the well-studied $k$-median problem, for which many approximation algorithms and hardness results are known. Note that the objective function encourages opening facilities in areas where there are many clients, and given a solution, it is often possible to get a good idea of where the clients are located. Read More

In recent years, considerable advances have been made in the study of properties of metric spaces in terms of their doubling dimension. This line of research has not only enhanced our understanding of finite metrics, but has also resulted in many algorithmic applications. However, we still do not understand the interaction between various graph-theoretic (topological) properties of graphs, and the doubling (geometric) properties of the shortest-path metrics induced by them. Read More