# Krzysztof Onak

## Contact Details

NameKrzysztof Onak |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Data Structures and Algorithms (11) Computer Science - Computational Geometry (2) Computer Science - Distributed; Parallel; and Cluster Computing (1) Computer Science - Computational Complexity (1) |

## Publications Authored By Krzysztof Onak

We initiate the study of property testing in arbitrary planar graphs. We prove that bipartiteness can be tested in constant time: for every planar graph $G$ and $\epsilon>0$, we can distinguish in constant time between the case that $G$ is bipartite and the case that $G$ is $\epsilon$-far from bipartite. The previous bound for this class of graphs was $\tilde{O}(\sqrt{n})$, where $n$ is the number of vertices, and the constant-time testability was only known for planar graphs with bounded degree. Read More

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+\epsilon)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). Read More

We prove $n^{1+\Omega(1/p)}/p^{O(1)}$ lower bounds for the space complexity of $p$-pass streaming algorithms solving the following problems on $n$-vertex graphs: * testing if an undirected graph has a perfect matching (this implies lower bounds for computing a maximum matching or even just the maximum matching size), * testing if two specific vertices are at distance at most $2(p+1)$ in an undirected graph, * testing if there is a directed path from $s$ to $t$ for two specific vertices $s$ and $t$ in a directed graph. Prior to our result, it was known that these problems require $\Omega(n^2)$ space in one pass, but no $n^{1+\Omega(1)}$ lower bound was known for any $p\ge 2$. These streaming results follow from a communication complexity lower bound for a communication game in which the players hold two graphs on the same set of vertices. Read More

We give a nearly optimal sublinear-time algorithm for approximating the size of a minimum vertex cover in a graph G. The algorithm may query the degree deg(v) of any vertex v of its choice, and for each 1 <= i <= deg(v), it may ask for the i-th neighbor of v. Letting VC_opt(G) denote the minimum size of vertex cover in G, the algorithm outputs, with high constant success probability, an estimate VC_estimate(G) such that VC_opt(G) <= VC_estimate(G) <= 2 * VC_opt(G) + epsilon*n, where epsilon is a given additive approximation parameter. Read More

We show how to compute the edit distance between two strings of length n up to a factor of 2^{\~O(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{\~O(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time. Read More

Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph. Read More

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector $x=(x_1,. Read More

Let $\mathcal{T}$ be a rooted and weighted tree, where the weight of any node is equal to the sum of the weights of its children. The popular Treemap algorithm visualizes such a tree as a hierarchical partition of a square into rectangles, where the area of the rectangle corresponding to any node in $\mathcal{T}$ is equal to the weight of that node. The aspect ratio of the rectangles in such a rectangular partition necessarily depends on the weights and can become arbitrarily high. Read More

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Read More

We consider the problem of testing distribution identity. Given a sequence of independent samples from an unknown distribution on a domain of size n, the goal is to check if the unknown distribution approximately equals a known distribution on the same domain. While Batu, Fortnow, Fischer, Kumar, Rubinfeld, and White (FOCS 2001) proved that the sample complexity of the problem is O~(sqrt(n) * poly(1/epsilon)), the running time of their tester is much higher: O(n) + O~(sqrt(n) * poly(1/epsilon)). Read More

Estimating frequency moments of data streams is a very well studied problem and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying streams that are randomly ordered. In the paper we improve the previous lower bounds on the space required to estimate the frequency moments of a randomly ordered streams. Read More

We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-optimal bounds in the insertion-only model without sketching. Our high-level approach is simple: we give algorithms to estimate Renyi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. Read More