# Aaditya Ramdas

## Contact Details

NameAaditya Ramdas |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesStatistics - Machine Learning (20) Computer Science - Learning (17) Statistics - Theory (10) Mathematics - Statistics (10) Computer Science - Artificial Intelligence (8) Mathematics - Information Theory (6) Computer Science - Information Theory (6) Mathematics - Optimization and Control (5) Mathematics - Numerical Analysis (5) Statistics - Methodology (5) Computer Science - Numerical Analysis (3) Mathematics - Probability (2) Statistics - Applications (2) Computer Science - Discrete Mathematics (1) Computer Science - Neural and Evolutionary Computing (1) Computer Science - Data Structures and Algorithms (1) |

## Publications Authored By Aaditya Ramdas

A significant literature has arisen to study ways to employing prior knowledge to improve power and precision of multiple testing procedures. Some common forms of prior knowledge may include (a) a priori beliefs about which hypotheses are null, modeled by non-uniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) partitions of the hypotheses into known groups, indicating (dis)similarity of hypotheses; and (d) knowledge of independence, positive dependence or arbitrary dependence between hypotheses or groups, allowing for more aggressive or conservative procedures. We present a general framework for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)-(d) simultaneously. Read More

We consider the problem of decoding a discrete signal of categorical variables from the observation of several histograms of pooled subsets of it. We present an Approximate Message Passing (AMP) algorithm for recovering the signal in the random dense setting where each observed histogram involves a random subset of entries of size proportional to n. We characterize the performance of the algorithm in the asymptotic regime where the number of observations $m$ tends to infinity proportionally to n, by deriving the corresponding State Evolution (SE) equations and studying their dynamics. Read More

Stochastic iterative algorithms such as the Kaczmarz and Gauss-Seidel methods have gained recent attention because of their speed, simplicity, and the ability to approximately solve large-scale linear systems of equations without needing to access the entire matrix. In this work, we consider the setting where we wish to solve a linear system in a large matrix X that is stored in a factorized form, X = UV; this setting either arises naturally in many applications or may be imposed when working with large low-rank datasets for reasons of space required for storage. We propose a variant of the randomized Kaczmarz method for such systems that takes advantage of the factored form, and avoids computing X. Read More

Consider a population consisting of n individuals, each of whom has one of d types (e.g. their blood type, in which case d=4). Read More

We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Read More

Slow mixing is the central hurdle when working with Markov chains, especially those used for Monte Carlo approximations (MCMC). In many applications, it is only of interest to estimate the stationary expectations of a small set of functions, and so the usual definition of mixing based on total variation convergence may be too conservative. Accordingly, we introduce function-specific analogs of mixing times and spectral gaps, and use them to prove Hoeffding-like function-specific concentration inequalities. Read More

Kernel methods provide an attractive framework for aggregating and learning from ranking data, and so understanding the fundamental properties of kernels over permutations is a question of broad interest. We provide a detailed analysis of the Fourier spectra of the standard Kendall and Mallows kernels, and a new class of polynomial-type kernels. We prove that the Kendall kernel has exactly two irreducible representations at which the Fourier transform is non-zero, and moreover, the associated matrices are rank one. Read More

Given a weighted graph with $N$ vertices, consider a real-valued regression problem in a semi-supervised setting, where one observes $n$ labeled vertices, and the task is to label the remaining ones. We present a theoretical study of $\ell_p$-based Laplacian regularization under a $d$-dimensional geometric random graph model. We provide a variational characterization of the performance of this regularized learner as $N$ grows to infinity while $n$ stays constant, the associated optimality conditions lead to a partial differential equation that must be satisfied by the associated function estimate $\hat{f}$. Read More

When data analysts train a classifier and check if its accuracy is significantly different from random guessing, they are implicitly and indirectly performing a hypothesis test (two sample testing) and it is of importance to ask whether this indirect method for testing is statistically optimal or not. Given that hypothesis tests attempt to maximize statistical power subject to a bound on the allowable false positive rate, while prediction attempts to minimize statistical risk on future predictions on unseen data, we wish to study whether a predictive approach for an ultimate aim of testing is prudent. We formalize this problem by considering the two-sample mean-testing setting where one must determine if the means of two Gaussians (with known and equal covariance) are the same or not, but the analyst indirectly does so by checking whether the accuracy achieved by Fisher's LDA classifier is significantly different from chance or not. Read More

Linear independence testing is a fundamental information-theoretic and statistical problem that can be posed as follows: given $n$ points $\{(X_i,Y_i)\}^n_{i=1}$ from a $p+q$ dimensional multivariate distribution where $X_i \in \mathbb{R}^p$ and $Y_i \in\mathbb{R}^q$, determine whether $a^T X$ and $b^T Y$ are uncorrelated for every $a \in \mathbb{R}^p, b\in \mathbb{R}^q$ or not. We give minimax lower bound for this problem (when $p+q,n \to \infty$, $(p+q)/n \leq \kappa < \infty$, without sparsity assumptions). In summary, our results imply that $n$ must be at least as large as $\sqrt {pq}/\|\Sigma_{XY}\|_F^2$ for any procedure (test) to have non-trivial power, where $\Sigma_{XY}$ is the cross-covariance matrix of $X,Y$. Read More

In many practical applications of multiple hypothesis testing using the False Discovery Rate (FDR), the given hypotheses can be naturally partitioned into groups, and one may not only want to control the number of false discoveries (wrongly rejected null hypotheses), but also the number of falsely discovered groups of hypotheses (we say a group is falsely discovered if at least one hypothesis within that group is rejected, when in reality the group contains only nulls). In this paper, we introduce the p-filter, a procedure which unifies and generalizes the standard FDR procedure by Benjamini and Hochberg and global null testing procedure by Simes. We first prove that our proposed method can simultaneously control the overall FDR at the finest level (individual hypotheses treated separately) and the group FDR at coarser levels (when such groups are user-specified). Read More

Nonparametric two sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being intelligently designed and analyzed, both for the unidimensional and the multivariate setting. Our contribution is to tie together many of these tests, drawing connections between seemingly very different statistics. Read More

Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing for any difference in distributions. A large number of test statistics have been proposed for both these settings. Read More

The Kaczmarz and Gauss-Seidel methods aim to solve a linear $m \times n$ system $\boldsymbol{X} \boldsymbol{\beta} = \boldsymbol{y}$ by iteratively refining the solution estimate; the former uses random rows of $\boldsymbol{X}$ {to update $\boldsymbol{\beta}$ given the corresponding equations} and the latter uses random columns of $\boldsymbol{X}$ {to update corresponding coordinates in $\boldsymbol{\beta}$}. Interest in these methods was recently revitalized by a proof of Strohmer and Vershynin showing linear convergence in expectation for a \textit{randomized} Kaczmarz method variant (RK), and a similar result for the randomized Gauss-Seidel algorithm (RGS) was later proved by Lewis and Leventhal. Recent work unified the analysis of these algorithms for the overcomplete and undercomplete systems, showing convergence to the ordinary least squares (OLS) solution and the minimum Euclidean norm solution respectively. Read More

We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Read More

We propose a new algorithmic framework for sequential hypothesis testing with i.i.d. Read More

We focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes - the Perceptron (primal) and Von-Neumann (dual) algorithms. We cast our problem as one of maximizing the regularized normalized hard-margin ($\rho$) in an RKHS and %use the Representer Theorem to rephrase it in terms of a Mahalanobis dot-product/semi-norm associated with the kernel's (normalized and signed) Gram matrix. We derive an accelerated smoothed algorithm with a convergence rate of $\tfrac{\sqrt {\log n}}{\rho}$ given $n$ separable points, which is strikingly similar to the classical kernelized Perceptron algorithm whose rate is $\tfrac1{\rho^2}$. Read More

Interesting theoretical associations have been established by recent papers between the fields of active learning and stochastic convex optimization due to the common role of feedback in sequential querying mechanisms. In this paper, we continue this thread in two parts by exploiting these relations for the first time to yield novel algorithms in both fields, further motivating the study of their intersection. First, inspired by a recent optimization algorithm that was adaptive to unknown uniform convexity parameters, we present a new active learning algorithm for one-dimensional thresholds that can yield minimax rates by adapting to unknown noise parameters. Read More

In active learning, the user sequentially chooses values for feature $X$ and an oracle returns the corresponding label $Y$. In this paper, we consider the effect of feature noise in active learning, which could arise either because $X$ itself is being measured, or it is corrupted in transmission to the oracle, or the oracle returns the label of a noisy version of the query point. In statistics, feature noise is known as "errors in variables" and has been studied extensively in non-active settings. Read More

The Kaczmarz and Gauss-Seidel methods both solve a linear system $\bf{X}\bf{\beta} = \bf{y}$ by iteratively refining the solution estimate. Recent interest in these methods has been sparked by a proof of Strohmer and Vershynin which shows the randomized Kaczmarz method converges linearly in expectation to the solution. Lewis and Leventhal then proved a similar result for the randomized Gauss-Seidel algorithm. Read More

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textit{general} alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textit{mean-shift} alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Read More

This paper is about randomized iterative algorithms for solving a linear system of equations $X \beta = y$ in different settings. Recent interest in the topic was reignited when Strohmer and Vershynin (2009) proved the linear convergence rate of a Randomized Kaczmarz (RK) algorithm that works on the rows of $X$ (data points). Following that, Leventhal and Lewis (2010) proved the linear convergence of a Randomized Coordinate Descent (RCD) algorithm that works on the columns of $X$ (features). Read More

Given a matrix $A$, a linear feasibility problem (of which linear classification is a special case) aims to find a solution to a primal problem $w: A^Tw > \textbf{0}$ or a certificate for the dual problem which is a probability distribution $p: Ap = \textbf{0}$. Inspired by the continued importance of "large-margin classifiers" in machine learning, this paper studies a condition measure of $A$ called its \textit{margin} that determines the difficulty of both the above problems. To aid geometrical intuition, we first establish new characterizations of the margin in terms of relevant balls, cones and hulls. Read More

This paper presents a fast and robust algorithm for trend filtering, a recently developed nonparametric regression tool. It has been shown that, for estimating functions whose derivatives are of bounded variation, trend filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines and kernels do not. Standing in the way of a more widespread practical adoption, however, is a lack of scalable and numerically stable algorithms for fitting trend filtering estimates. Read More

This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Read More

This paper deals with the problem of nonparametric independence testing, a fundamental decision-theoretic problem that asks if two arbitrary (possibly multivariate) random variables $X,Y$ are independent or not, a question that comes up in many fields like causality and neuroscience. While quantities like correlation of $X,Y$ only test for (univariate) linear independence, natural alternatives like mutual information of $X,Y$ are hard to estimate due to a serious curse of dimensionality. A recent approach, avoiding both issues, estimates norms of an \textit{operator} in Reproducing Kernel Hilbert Spaces (RKHSs). Read More

Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. Read More

We focus on the problem of minimizing a convex function $f$ over a convex set $S$ given $T$ queries to a stochastic first order oracle. We argue that the complexity of convex minimization is only determined by the rate of growth of the function around its minimizer $x^*_{f,S}$, as quantified by a Tsybakov-like noise condition. Specifically, we prove that if $f$ grows at least as fast as $\|x-x^*_{f,S}\|^\kappa$ around its minimum, for some $\kappa > 1$, then the optimal rate of learning $f(x^*_{f,S})$ is $\Theta(T^{-\frac{\kappa}{2\kappa-2}})$. Read More