# Natesh Pillai - Harvard University

## Contact Details

NameNatesh Pillai |
||

AffiliationHarvard University |
||

CityCambridge |
||

CountryUnited States |
||

## Pubs By Year |
||

## External Links |
||

## Pub CategoriesStatistics - Theory (23) Mathematics - Statistics (23) Mathematics - Probability (21) Statistics - Computation (19) Statistics - Methodology (17) Statistics - Applications (10) Quantitative Biology - Quantitative Methods (2) Mathematics - Dynamical Systems (2) Mathematics - Mathematical Physics (1) Mathematical Physics (1) Computer Science - Computational Complexity (1) Physics - Soft Condensed Matter (1) Mathematics - Numerical Analysis (1) |

## Publications Authored By Natesh Pillai

It is known that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding a divergence free component to the drift of a Langevin diffusion accelerates its convergence to stationarity. However such a stochastic differential equation (SDE) is no longer reversible. Read More

We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations $n$ into subsamples while keeping the dimension $p$ fixed. Read More

Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a computationally expensive model, such as a system of partial differential equations. In recent work [Conrad et al. JASA 2015, arXiv:1402. Read More

As it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space partitioning. We show empirically that this approach speeds up MCMC sampling for multimodal distributions and that it can be usefully applied in greater generality than several related algorithms. Read More

Determining the total variation mixing time of Kac's random walk on the special orthogonal group $\mathrm{SO}(n)$ has been a long-standing open problem. In this paper, we construct a novel non-Markovian coupling for bounding this mixing time. The analysis of our coupling entails controlling the smallest singular value of a certain random matrix with highly dependent entries. Read More

Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Read More

Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse posterior summaries under standard loss functions, has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians. Interestingly, we demonstrate that many commonly used shrinkage priors, including the Bayesian Lasso, do not have adequate posterior concentration in high-dimensional settings. Read More

Many modern applications collect large sample size and highly imbalanced categorical data, with some categories being relatively rare. Bayesian hierarchical models are well motivated in such settings in providing an approach to borrow information to combat data sparsity, while quantifying uncertainty in estimation. However, a fundamental problem is scaling up posterior computation to massive sample sizes. Read More

It is well known that the ratio of two independent standard Gaussian random variables follows a Cauchy distribution. Any convex combination of independent standard Cauchy random variables also follows a Cauchy distribution. In a recent joint work, the author proved a surprising multivariate generalization of the above facts. Read More

Principal stratification is a widely used framework for addressing post-randomization complications in a principled way. After using principal stratification to define causal effects of interest, researchers are increasingly turning to finite mixture models to estimate these quantities. Unfortunately, standard estimators of the mixture parameters, like the MLE, are known to exhibit pathological behavior. Read More

In this paper we study the asymptotic behavior of the Random-Walk Metropolis algorithm on probability densities with two different `scales', where most of the probability mass is distributed along certain key directions with the `orthogonal' directions containing relatively less mass. Such class of probability measures arise in various applied contexts including Bayesian inverse problems where the posterior measure concentrates on a sub-manifold when the noise variance goes to zero. When the target measure concentrates on a linear sub-manifold, we derive analytically a diffusion limit for the Random-Walk Metropolis Markov chain as the scale parameter goes to zero. Read More

Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle arises when there is a non-diffusive experimental drift in the data. Read More

Determining the mixing time of Kac's random walk on the sphere $\mathrm{S}^{n-1}$ is a long-standing open problem. We show that the total variation mixing time of Kac's walk on $\mathrm{S}^{n-1}$ is between $\frac{1}{2} \, n \log(n)$ and $200 \,n \log(n)$. Our bound is thus optimal up to a constant factor, improving on the best-known upper bound of $O(n^{5} \log(n)^{2})$ due to Jiang. Read More

Many finite-state reversible Markov chains can be naturally decomposed into "projection" and "restriction" chains. In this paper we provide bounds on the total variation mixing times of the original chain in terms of the mixing properties of these related chains. This paper is in the tradition of existing bounds on Poincare and log-Sobolev constants of Markov chains in terms of similar decompositions [JSTV04, MR02, MR06, MY09]. Read More

In this paper, we investigate Gaussian process regression models where inputs are subject to measurement error. In spatial statistics, input measurement errors occur when the geographical locations of observed data are not known exactly. Such sources of error are not special cases of "nugget" or microscale variation, and require alternative methods for both interpolation and parameter estimation. Read More

The Cauchy distribution is usually presented as a mathematical curiosity, an exception to the Law of Large Numbers, or even as an "Evil" distribution in some introductory courses. It therefore surprised us when Drton and Xiao (2016) proved the following result for $m=2$ and conjectured it for $m\ge 3$. Let $X= (X_1,. Read More

We study a kinetically constrained Ising process (KCIP) associated with a graph G and density parameter p; this process is an interacting particle system with state space $\{0,1\}^{G}$. The stationary distribution of the KCIP Markov chain is the Binomial($|G|, p$) distribution on the number of particles, conditioned on having at least one particle. The `constraint' in the name of the process refers to the rule that a vertex cannot change its state unless it has at least one neighbour in state `1'. Read More

State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic modeling of the complex particle motion has received comparatively little attention. Read More

In many modern applications, difficulty in evaluating the posterior density makes performing even a single MCMC step slow. This difficulty can be caused by intractable likelihood functions, but also appears for routine problems with large data sets. Many researchers have responded by running approximate versions of MCMC algorithms. Read More

We analyze the computational efficiency of approximate Bayesian computation (ABC), which approximates a likelihood function by drawing pseudo-samples from the associated model. For the rejection sampling version of ABC, it is known that multiple pseudo-samples cannot substantially increase (and can substantially decrease) the efficiency of the algorithm as compared to employing a high-variance estimate based on a single pseudo-sample. We show that this conclusion also holds for a Markov chain Monte Carlo version of ABC, implying that it is unnecessary to tune the number of pseudo-samples used in ABC-MCMC. Read More

We construct a new framework for accelerating Markov chain Monte Carlo in posterior sampling problems where standard methods are limited by the computational cost of the likelihood, or of numerical models embedded therein. Our approach introduces local approximations of these models into the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Previous efforts at integrating approximate models into inference typically sacrifice either the sampler's exactness or efficiency; our work seeks to address these limitations by exploiting useful convergence characteristics of local approximations. Read More

Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. Read More

In the AGEMAP genomics study, researchers were interested in detecting genes related to age in a variety of tissue types. After not finding many age-related genes in some of the analyzed tissue types, the study was criticized for having low power. It is possible that the low power is due to the presence of important unmeasured variables, and indeed we find that a latent factor model appears to explain substantial variability not captured by measured covariates. Read More

Adaptive Markov chains are an important class of Monte Carlo methods for sampling from probability distributions. The time evolution of adaptive algorithms depends on past samples, and thus these algorithms are non-Markovian. Although there has been previous work establishing conditions for their ergodicity, not much is known theoretically about their finite sample properties. Read More

Lung tumor tracking for radiotherapy requires real-time, multiple-step ahead forecasting of a quasi-periodic time series recording instantaneous tumor locations. We introduce a location-mixture autoregressive (LMAR) process that admits multimodal conditional distributions, fast approximate inference using the EM algorithm and accurate multiple-step ahead predictive distributions. LMAR outperforms several commonly used methods in terms of out-of-sample prediction accuracy using clinical data from lung tumor patients. Read More

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. Read More

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. Read More

We describe a new MCMC method optimized for the sampling of probability measures on Hilbert space which have a density with respect to a Gaussian; such measures arise in the Bayesian approach to inverse problems, and in conditioned diffusions. Our algorithm is based on two key design principles: (i) algorithms which are well-defined in infinite dimensions result in methods which do not suffer from the curse of dimensionality when they are applied to approximations of the infinite dimensional target measure on $\bbR^N$; (ii) non-reversible algorithms can have better mixing properties compared to their reversible counterparts. The method we introduce is based on the hybrid Monte Carlo algorithm, tailored to incorporate these two design principles. Read More

In this paper we construct a framework for doing statistical inference for discretely observed stochastic differential equations (SDEs) where the driving noise has 'memory'. Classical SDE models for inference assume the driving noise to be Brownian motion, or "white noise", thus implying a Markov assumption. We focus on the case when the driving noise is a fractional Brownian motion, which is a common continuous-time modeling device for capturing long-range memory. Read More

We consider the asymptotic consistency of maximum likelihood parameter estimation for dynamical systems observed with noise. Under suitable conditions on the dynamical systems and the observations, we show that maximum likelihood parameter estimation is consistent. Our proof involves ideas from both information theory and dynamical systems. Read More

Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. Read More

A framework for causal inference from two-level factorial designs is proposed. The framework utilizes the concept of potential outcomes that lies at the center stage of causal inference and extends Neyman's repeated sampling approach for estimation of causal effects and randomization tests based on Fisher's sharp null hypothesis to the case of 2-level factorial experiments. The framework allows for statistical inference from a finite population, permits definition and estimation of estimands other than "average factorial effects" and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model. Read More

Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in high-dimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence rates in inferring high-dimensional covariance matrices where the dimension can be larger than the sample size. Under relevant sparsity assumptions on the true covariance matrix, we show that commonly-used point mass mixture priors on the factor loadings lead to consistent estimation in the operator norm even when $p\gg n$. Read More

The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research. Read More

Let $\widetilde{X}_{M\times N}$ be a rectangular data matrix with independent real-valued entries $[\widetilde{x}_{ij}]$ satisfying $\mathbb {E}\widetilde{x}_{ij}=0$ and $\mathbb {E}\widetilde{x}^2_{ij}=\frac{1}{M}$, $N,M\to\infty$. These entries have a subexponential decay at the tails. We will be working in the regime $N/M=d_N,\lim_{N\to\infty}d_N\neq0,1,\infty$. Read More

In this paper we prove the universality of covariance matrices of the form $H_{N\times N}={X}^{\dagger}X$ where $X$ is an ${M\times N}$ rectangular matrix with independent real valued entries $x_{ij}$ satisfying $\mathbb{E}x_{ij}=0$ and $\mathbb{E}x^2_{ij}={\frac{1}{M}}$, $N$, $M\to \infty$. Furthermore it is assumed that these entries have sub-exponential tails or sufficiently high number of moments. We will study the asymptotics in the regime $N/M=d_N\in(0,\infty),\lim_{N\to\infty}d_N\neq0,\infty$. Read More

We propose a flexible class of models based on scale mixture of uniform distributions to construct shrinkage priors for covariance matrix estimation. This new class of priors enjoys a number of advantages over the traditional scale mixture of normal priors, including its simplicity and flexibility in characterizing the prior density. We also exhibit a simple, easy to implement Gibbs sampler for posterior simulation which leads to efficient estimation in high dimensional problems. Read More

Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis-Hastings accept-reject mechanism to an Ornstein-Uhlenbeck proposal which is itself reversible with respect to the underlying Gaussian measure. Read More

We consider differential equations driven by rough paths and study the regularity of the laws and their long time behavior. In particular, we focus on the case when the driving noise is a rough path valued fractional Brownian motion with Hurst parameter $H\in(\frac{1}{3},\frac{1}{2}]$. Our contribution in this work is twofold. Read More

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field measure and arise in many applications such as Bayesian nonparametric statistics and the theory of conditioned diffusions. Read More

**Affiliations:**

^{1}University Paris-Dauphine,

^{2}INRA, Montpellier,

^{3}I3M, Montpellier,

^{4}Harvard University

Approximate Bayesian computation (ABC) have become a essential tool for the analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated the use of ABC for Bayesian model choice in the specific case of Gibbs random fields, relying on a inter-model sufficiency property to show that the approximation was legitimate. Read More

**Affiliations:**

^{1}Universite Paris Dauphine,

^{2}Universite de Montpellier 2,

^{3}Harvard University

Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a favourite tool for the analysis of complex stochastic models, primarily in population genetics but also in financial analyses. We advocated in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific case of Gibbs random fields (GRF), relying on a sufficiency property mainly enjoyed by GRFs to show that the approach was legitimate. Read More

Diffusion limits of MCMC methods in high dimensions provide a useful theoretical tool for studying computational complexity. In particular, they lead directly to precise estimates of the number of steps required to explore the target measure, in stationarity, as a function of the dimension of the state space. However, to date such results have mainly been proved for target measures with a product structure, severely limiting their applicability. Read More

We investigate the properties of the Hybrid Monte-Carlo algorithm (HMC) in high dimensions. HMC develops a Markov chain reversible w.r. Read More

We demonstrate that stochastic differential equations (SDEs) driven by fractional Brownian motion with Hurst parameter H > 1/2 have similar ergodic properties as SDEs driven by standard Brownian motion. The focus in this article is on hypoelliptic systems satisfying H\"ormander's condition. We show that such systems satisfy a suitable version of the strong Feller property and we conclude that they admit a unique stationary solution that is physical in the sense that it does not "look into the future". Read More

We consider a simple model for the fluctuating hydrodynamics of a flexible polymer in dilute solution, demonstrating geometric ergodicity for a pair of particles that interact with each other through a nonlinear spring potential while being advected by a stochastic Stokes fluid velocity field. This is a generalization of previous models which have used linear spring forces as well as white-in-time fluid velocity fields. We follow previous work combining control theoretic arguments, Lyapunov functions, and hypo-elliptic diffusion theory to prove exponential convergence via a Harris chain argument. Read More

We consider a family of stochastic processes $\{X_t^\epsilon, t \in T\}$ on a metric space $T$, with a parameter $\epsilon \downarrow 0$. We study the conditions under which \lim_{\e \to 0} \P \Big(\sup_{t \in T} |X_t^\e| < \delta \Big) =1 when one has the \textit{a priori} estimate on the modulus of continuity and the value at one point. We compare our problem to the celebrated Kolmogorov continuity criteria for stochastic processes, and finally give an application of our main result for stochastic intergrals with respect to compound Poisson random measures with infinite intensity measures. Read More