# Jelani Nelson

## Contact Details

NameJelani Nelson |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Data Structures and Algorithms (23) Mathematics - Information Theory (9) Computer Science - Information Theory (9) Computer Science - Computational Complexity (6) Mathematics - Probability (6) Computer Science - Computational Geometry (5) Computer Science - Discrete Mathematics (3) Computer Science - Learning (3) Mathematics - Functional Analysis (2) Statistics - Machine Learning (2) |

## Publications Authored By Jelani Nelson

In insertion-only streaming, one sees a sequence of indices $a_1, a_2, \ldots, a_m\in [n]$. The stream defines a sequence of $m$ frequency vectors $x^{(1)},\ldots,x^{(m)}\in\mathbb{R}^n$ with $(x^{(t)})_i = |\{j : j\in[t], a_j = i\}|$. That is, $x^{(t)}$ is the frequency vector after seeing the first $t$ items in the stream. Read More

In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x, y \in\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n,\log(1/\delta)\log^2(\frac n{\log(1/\delta)})\})$ for failure probability $\delta$. Read More

In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x$ and $y$ in $\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We prove that the randomized one-way communication complexity of this problem in the public coin model is exactly $\Theta(\min\{n, \log(1/\delta)\log^2(\frac{n}{\log(1/\delta)})\})$ bits for failure probability $\delta$. Read More

For any integers $d, n \geq 2$ and $1/({\min\{n,d\}})^{0.4999} < \varepsilon<1$, we show the existence of a set of $n$ vectors $X\subset \mathbb{R}^d$ such that any embedding $f:X\rightarrow \mathbb{R}^m$ satisfying $$ \forall x,y\in X,\ (1-\varepsilon)\|x-y\|_2^2\le \|f(x)-f(y)\|_2^2 \le (1+\varepsilon)\|x-y\|_2^2 $$ must have $$ m = \Omega(\varepsilon^{-2} \lg n). $$ This lower bound matches the upper bound given by the Johnson-Lindenstrauss lemma [JL84]. Read More

In compressed sensing, one wishes to acquire an approximately sparse high-dimensional signal $x\in\mathbb{R}^n$ via $m\ll n$ noisy linear measurements, then later approximately recover $x$ given only those measurement outcomes. Various guarantees have been studied in terms of the notion of approximation in recovery, and some isolated folklore results are known stating that some forms of recovery are stronger than others, via black-box reductions. In this note we provide a general theorem concerning the hierarchy of strengths of various recovery guarantees. Read More

In turnstile $\ell_p$ $\varepsilon$-heavy hitters, one maintains a high-dimensional $x\in\mathbb{R}^n$ subject to $\texttt{update}(i,\Delta)$ causing $x_i\leftarrow x_i + \Delta$, where $i\in[n]$, $\Delta\in\mathbb{R}$. Upon receiving a query, the goal is to report a small list $L\subset[n]$, $|L| = O(1/\varepsilon^p)$, containing every "heavy hitter" $i\in[n]$ with $|x_i| \ge \varepsilon \|x_{\overline{1/\varepsilon^p}}\|_p$, where $x_{\overline{k}}$ denotes the vector obtained by zeroing out the largest $k$ entries of $x$ in magnitude. For any $p\in(0,2]$ the CountSketch solves $\ell_p$ heavy hitters using $O(\varepsilon^{-p}\log n)$ words of space with $O(\log n)$ update time, $O(n\log n)$ query time to output $L$, and whose output after any query is correct with high probability (whp) $1 - 1/poly(n)$. Read More

The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In sub-polynomial space, the strongest guarantee available is the $\ell_2$ guarantee, which requires finding all items that occur at least $\varepsilon\|f\|_2$ times in the stream, where the $i$th coordinate of the vector $f$ is the number of occurrences of $i$ in the stream. The first algorithm to achieve the $\ell_2$ guarantee was the CountSketch of [CCF04], which for constant $\varepsilon$ requires $O(\log n)$ words of memory and $O(\log n)$ update time, and is known to be space-optimal if the stream allows for deletions. Read More

In "dictionary learning" we observe $Y = AX + E$ for some $Y\in\mathbb{R}^{n\times p}$, $A \in\mathbb{R}^{m\times n}$, and $X\in\mathbb{R}^{m\times p}$. The matrix $Y$ is observed, and $A, X, E$ are unknown. Here $E$ is "noise" of small norm, and $X$ is column-wise sparse. Read More

We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows. Here $\tilde{r}$ is the maximum stable rank, i.e. Read More

We consider a simple model of imprecise comparisons: there exists some $\delta>0$ such that when a subject is given two elements to compare, if the values of those elements (as perceived by the subject) differ by at least $\delta$, then the comparison will be made correctly; when the two elements have values that are within $\delta$, the outcome of the comparison is unpredictable. This model is inspired by both imprecision in human judgment of values and also by bounded but potentially adversarial errors in the outcomes of sporting tournaments. Our model is closely related to a number of models commonly considered in the psychophysics literature where $\delta$ corresponds to the {\em just noticeable difference unit (JND)} or {\em difference threshold}. Read More

For any $n>1$ and $0<\varepsilon<1/2$, we show the existence of an $n^{O(1)}$-point subset $X$ of $\mathbb{R}^n$ such that any linear map from $(X,\ell_2)$ to $\ell_2^m$ with distortion at most $1+\varepsilon$ must have $m = \Omega(\min\{n, \varepsilon^{-2}\log n\})$. Our lower bound matches the upper bounds provided by the identity matrix and the Johnson-Lindenstrauss lemma, improving the previous lower bound of Alon by a $\log(1/\varepsilon)$ factor. Read More

We say a turnstile streaming algorithm is "non-adaptive" if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during queries may be decided upon adaptively. All known turnstile streaming algorithms in the literature are non-adaptive. Read More

Let $\Phi\in\mathbb{R}^{m\times n}$ be a sparse Johnson-Lindenstrauss transform [KN14] with $s$ non-zeroes per column. For a subset $T$ of the unit sphere, $\varepsilon\in(0,1/2)$ given, we study settings for $m,s$ required to ensure $$ \mathop{\mathbb{E}}_\Phi \sup_{x\in T} \left|\|\Phi x\|_2^2 - 1 \right| < \varepsilon , $$ i.e. Read More

An oblivious subspace embedding (OSE) for some eps, delta in (0,1/3) and d <= m <= n is a distribution D over R^{m x n} such that for any linear subspace W of R^n of dimension d, Pr_{Pi ~ D}(for all x in W, (1-eps) |x|_2 <= |Pi x|_2 <= (1+eps)|x|_2) >= 1 - delta. We prove that any OSE with delta < 1/3 must have m = Omega((d + log(1/delta))/eps^2), which is optimal. Furthermore, if every Pi in the support of D is sparse, having at most s non-zero entries per column, then we show tradeoff lower bounds between m and s. Read More

In compressed sensing, the "restricted isometry property" (RIP) is a sufficient condition for the efficient reconstruction of a nearly k-sparse vector x in C^d from m linear measurements Phi x. It is desirable for m to be small, and for Phi to support fast matrix-vector multiplication. In this work, we give a randomized construction of RIP matrices Phi in C^{m x d}, preserving the L_2 norms of all k-sparse vectors with distortion 1+eps, where the matrix-vector multiply Phi x can be computed in nearly linear time. Read More

We give near-tight lower bounds for the sparsity required in several dimensionality reducing linear maps. First, consider the JL lemma which states that for any set of n vectors in R there is a matrix A in R^{m x d} with m = O(eps^{-2}log n) such that mapping by A preserves pairwise Euclidean distances of these n vectors up to a 1 +/- eps factor. We show that there exists a set of n vectors such that any such matrix A with at most s non-zero entries per column must have s = Omega(eps^{-1}log n/log(1/eps)) as long as m < O(n/log(1/eps)). Read More

An "oblivious subspace embedding (OSE)" given some parameters eps,d is a distribution D over matrices B in R^{m x n} such that for any linear subspace W in R^n with dim(W) = d it holds that Pr_{B ~ D}(forall x in W ||B x||_2 in (1 +/- eps)||x||_2) > 2/3 We show an OSE exists with m = O(d^2/eps^2) and where every B in the support of D has exactly s=1 non-zero entries per column. This improves previously best known bound in [Clarkson-Woodruff, arXiv:1207.6365]. Read More

We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix A in R^{m x n} and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions: * A proof that linf/l1 sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Read More

We give two different and simple constructions for dimensionality reduction in $\ell_2$ via linear mappings that are sparse: only an $O(\varepsilon)$-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion $1+\varepsilon$ with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar, and Sarl\'{o}s (STOC 2010). Such distributions can be used to speed up applications where $\ell_2$ dimensionality reduction is used. Read More

We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous space-optimal algorithm of [Kane-Nelson-Woodruff, SODA 2010], which had update time Omega(1/eps^2). Read More

Recent work of [Dasgupta-Kumar-Sarlos, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. Read More

Let x be a random vector coming from any k-wise independent distribution over {-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is determined up to an additive epsilon for k = poly(1/epsilon). This answers an open question of Diakonikolas et al. Read More

We give the first L_1-sketching algorithm for integer vectors which produces nearly optimal sized sketches in nearly linear time. This answers the first open problem in the list of open problems from the 2006 IITK Workshop on Algorithms for Data Streams. Specifically, suppose Alice receives a vector x in {-M,. Read More

The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also receives an error parameter 0 < eps < 1. Read More

We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-optimal bounds in the insertion-only model without sketching. Our high-level approach is simple: we give algorithms to estimate Renyi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. Read More