# Praneeth Netrapalli

## Contact Details

NamePraneeth Netrapalli |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Learning (17) Statistics - Machine Learning (17) Mathematics - Optimization and Control (10) Computer Science - Data Structures and Algorithms (7) Mathematics - Numerical Analysis (4) Computer Science - Information Theory (4) Mathematics - Information Theory (4) Computer Science - Numerical Analysis (2) Mathematics - Probability (2) Physics - Physics and Society (1) Computer Science - Artificial Intelligence (1) Mathematics - Statistics (1) Computer Science - Neural and Evolutionary Computing (1) Physics - Statistical Mechanics (1) Statistics - Theory (1) Computer Science - Computational Complexity (1) |

## Publications Authored By Praneeth Netrapalli

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. Read More

Understanding the singular value spectrum of a matrix $A \in \mathbb{R}^{n \times n}$ is a fundamental task in countless applications. In matrix multiplication time, it is possible to perform a full SVD and directly compute the singular values $\sigma_1,.. Read More

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). Read More

We consider the problem of outlier robust PCA (OR-PCA) where the goal is to recover principal directions despite the presence of outlier data points. That is, given a data matrix $M^*$, where $(1-\alpha)$ fraction of the points are noisy samples from a low-dimensional subspace while $\alpha$ fraction of the points can be arbitrary outliers, the goal is to recover the subspace accurately. Existing results for \OR-PCA have serious drawbacks: while some results are quite weak in the presence of noise, other results have runtime quadratic in dimension, rendering them impractical for large scale applications. Read More

This work characterizes the benefits of averaging techniques widely used in conjunction with stochastic gradient descent (SGD). In particular, this work sharply analyzes: (1) mini-batching, a method of averaging many samples of the gradient to both reduce the variance of a stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD in order to decrease the variance in SGD's final iterate. This work presents the first tight non-asymptotic generalization error bounds for these schemes for the stochastic approximation problem of least squares regression. Read More

We give upper and lower bounds on the information-theoretic threshold for community detection in the stochastic block model. Specifically, consider the symmetric stochastic block model with $q$ groups, average degree $d$, and connection probabilities $c_\text{in}/n$ and $c_\text{out}/n$ for within-group and between-group edges respectively; let $\lambda = (c_\text{in}-c_\text{out})/(qd)$. We show that, when $q$ is large, and $\lambda = O(1/q)$, the critical value of $d$ at which community detection becomes possible---in physical terms, the condensation threshold---is \[ d_\text{c} = \Theta\!\left( \frac{\log q}{q \lambda^2} \right) \, , \] with tighter results in certain regimes. Read More

Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. Read More

We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $\Sigma$ -- i.e. computing a unit vector $x$ such that $x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $\Sigma = A^TA$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon )$. Read More

This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al. Read More

This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = \Sigma$ for $\Sigma \succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $\Sigma$. The algorithm nearly matches (and in certain cases improves upon) the accuracy obtained by the standard batch method that computes top eigenvector of the empirical covariance $\frac{1}{n} \sum_{i \in [n]} A_i$ as analyzed by the matrix Bernstein inequality. Read More

We provide faster algorithms and improved sample complexities for approximating the top eigenvector of a matrix. Offline Setting: Given an $n \times d$ matrix $A$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O ( [nnz(A) + \frac{d \cdot sr(A)}{gap^2}]\cdot \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d \cdot sr(A))^{1/4}}{\sqrt{gap}}]\cdot \log1/\epsilon )$. Here $sr(A)$ is the stable rank and $gap$ is the multiplicative eigenvalue gap. Read More

While there has been a significant amount of work studying gradient descent techniques for non-convex optimization problems over the last few years, all existing results establish either local convergence with good rates or global convergence with highly suboptimal rates, for many problems of interest. In this paper, we take the first step in getting the best of both worlds -- establishing global convergence and obtaining a good rate of convergence for the problem of computing squareroot of a positive definite (PD) matrix, which is a widely studied problem in numerical linear algebra with applications in machine learning and statistics among others. Given a PD matrix $M$ and a PD starting point $U_0$, we show that gradient descent with appropriately chosen step-size finds an $\epsilon$-accurate squareroot of $M$ in $O(\alpha \log (\|M-U_0^2\|_F /\epsilon))$ iterations, where $\alpha = (\max\{\|U_0\|_2^2,\|M\|_2\} / \min \{\sigma_{\min}^2(U_0),\sigma_{\min}(M) \} )^{3/2}$. Read More

An active learner is given a class of models, a large set of unlabeled examples, and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a model in the class that fits the data well. Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC model. In this paper, we shift our attention to a more general setting -- maximum likelihood estimation. Read More

Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Read More

Matrix completion is the problem of recovering a low rank matrix by observing a small fraction of its entries. A series of recent works [KOM12,JNS13,HW14] have proposed fast non-convex optimization based iterative algorithms to solve this problem. However, the sample complexity in all these results is sub-optimal in its dependence on the rank, condition number and the desired accuracy. Read More

We propose a new method for robust PCA -- the task of recovering a low-rank matrix from sparse corruptions that are of unknown value and support. Our method involves alternating between projecting appropriate residuals onto the set of low-rank matrices, and the set of sparse matrices; each projection is {\em non-convex} but easy to compute. In spite of this non-convexity, we establish exact recovery of the low-rank matrix, under the same conditions that are required by existing methods (which are based on convex optimization). Read More

We consider the problem of clustering (or reconstruction) in the stochastic block model, in the regime where the average degree is constant. For the case of two clusters with equal sizes, recent results by Mossel, Neeman and Sly, and by Massoulie, show that reconstructability undergoes a phase transition at the Kesten-Stigum bound of $\lambda_2^2 d = 1$, where $\lambda_2$ is the second largest eigenvalue of a related stochastic matrix and $d$ is the average degree. In this paper, we address the general case of more than two clusters and/or unbalanced cluster sizes. Read More

We consider the problem of sparse coding, where each sample consists of a sparse linear combination of a set of dictionary atoms, and the task is to learn both the dictionary elements and the mixing coefficients. Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed. Typically, the coefficients are estimated via $\ell_1$ minimization, keeping the dictionary fixed, and the dictionary is estimated through least squares, keeping the coefficients fixed. Read More

We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements. Our main result is a strategy to approximately recover the unknown dictionary using an efficient algorithm. Our algorithm is a clustering-style procedure, where each cluster is used to estimate a dictionary element. Read More

Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers) information. More than four decades after it was first proposed, the seminal error reduction algorithm of (Gerchberg and Saxton 1972) and (Fienup 1982) is still the popular choice for solving many variants of this problem. The algorithm is based on alternating minimization; i. Read More

Alternating minimization represents a widely applicable and empirically successful approach for finding low-rank matrices that best fit the given data. For example, for the problem of low-rank matrix completion, this method is believed to be one of the most accurate and efficient, and formed a major component of the winning entry in the Netflix Challenge. In the alternating minimization approach, the low-rank target matrix is written in a bi-linear form, i. Read More

We consider the problem of finding the graph on which an epidemic cascade spreads, given only the times when each node gets infected. While this is a problem of importance in several contexts -- offline and online social networks, e-commerce, epidemiology, vulnerabilities in infrastructure networks -- there has been very little work, analytical or empirical, on finding the graph. Clearly, it is impossible to do so from just one cascade; our interest is in learning the graph from a small number of cascades. Read More

We propose a new yet natural algorithm for learning the graph structure of general discrete graphical models (a.k.a. Read More

Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus our attention on the class of planar Ising models, for which inference is tractable using techniques of statistical physics [Kac and Ward; Kasteleyn]. Read More