# Jonathan Taylor

## Contact Details

NameJonathan Taylor |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesStatistics - Methodology (27) Mathematics - Statistics (23) Statistics - Theory (23) Statistics - Machine Learning (9) Mathematics - Probability (7) Computer Science - Learning (5) Statistics - Applications (3) Mathematics - Differential Geometry (2) Physics - Optics (2) Computer Science - Artificial Intelligence (2) Computer Science - Neural and Evolutionary Computing (2) Statistics - Computation (1) Mathematics - Optimization and Control (1) Mathematics - Algebraic Topology (1) Computer Science - Computer Vision and Pattern Recognition (1) Physics - Soft Condensed Matter (1) |

## Publications Authored By Jonathan Taylor

We describe a way to construct hypothesis tests and confidence intervals after having used the Lasso for feature selection, allowing the regularization parameter to be chosen via an estimate of prediction error. Our estimate of prediction error is a slight variation on cross-validation. Using this variation, we are able to describe an appropriate selection event for choosing a parameter by cross-validation. Read More

The current work proposes a Monte Carlo free alternative to inference post randomized selection algorithms with a convex loss and a convex penalty. The pivots based on the selective law that is truncated to all selected realizations, typically lack closed form expressions in randomized settings. Inference in these settings relies upon standard Monte Carlo sampling techniques, which can be prove to be unstable for parameters far off from the chosen reference distribution. Read More

Adopting the Bayesian methodology of adjusting for selection to provide valid inference in Panigrahi (2016), the current work proposes an approximation to a selective posterior, post randomized queries on data. Such a posterior differs from the usual one as it involves a truncated likelihood prepended with a prior belief on parameters in a Bayesian model. The truncation, imposed by selection, leads to intractability of the selective posterior, thereby posing a technical hurdle in sampling from such a posterior. Read More

Recently, Tian Harris and Taylor (2015) proposed an asymptotically pivotal test statistic valid post selection with a randomized response. In this work, we relax the more restrictive local alternatives assumption, thereby allowing for rare selection events, to improve upon their selective CLT result for heavier tailed randomizations. We also show that under the local alternatives assumption on the parameter, selective CLT holds for Gaussian randomization as well. Read More

We study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. Read More

We consider the problem of selective inference after solving a (randomized) convex statistical learning program in the form of a penalized or constrained loss function. Our first main result is a change-of-measure formula that describes many conditional sampling problems of interest in selective inference. Our approach is model-agnostic in the sense that users may provide their own statistical model for inference, we simply provide the modification of each distribution in the model after the selection. Read More

We study machine learning formulations of inductive program synthesis; given input-output examples, we try to synthesize source code that maps inputs to corresponding outputs. Our aims are to develop new machine learning approaches based on neural networks and graphical models, and to understand the capabilities of machine learning techniques relative to traditional alternatives, such as those based on constraint solving from the programming languages community. Our key contribution is the proposal of TerpreT, a domain-specific language for expressing program synthesis problems. Read More

As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. Read More

We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information, and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Read More

Selective inference is a recent research topic that tries to perform valid inference after using the data to select a reasonable statistical model. We propose MAGIC, a new method for selective inference that is general, powerful and tractable. MAGIC is a method for selective inference after solving a convex optimization problem with smooth loss and $\ell_1$ penalty. Read More

We provide Bayesian inference for a linear model selected after observing the data. Adopting \citet{yekutieli2012adjusted}'s ideas, the Bayesian model consists of a prior and a truncated likelihood. The resulting posterior distribution, unlike in the setup usually considered when performing Bayesian variable selection, is affected by the very fact that selection was applied. Read More

We present a new method for post-selection inference for L1 (lasso)-penalized likelihood models, including generalized regression models. Our approach generalizes the post-selection framework presented in Lee et al (2014). The method provides p-values and confidence intervals that are asymptotically valid, conditional on the inherent selection done by the lasso. Read More

We study the a.s. convergence of a sequence of random embeddings of a fixed manifold into Euclidean spaces of increasing dimensions. Read More

Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. Read More

Applied statistical problems often come with pre-specified groupings to predictors. It is natural to test for the presence of simultaneous group-wide signal for groups in isolation, or for multiple groups together. Classical tests for the presence of such signals rely either on tests for the omission of the entire block of variables (the classical F-test) or on the creation of an unsupervised prototype for the group (either a group centroid or first principal component) and subsequent t-tests on these prototypes. Read More

We provide a general mathematical framework for selective inference with supervised model selection procedures characterized by quadratic forms in the outcome variable. Forward stepwise with groups of variables is an important special case as it allows models with categorical variables or factors. Models can be chosen by AIC, BIC, or a fixed number of steps. Read More

A common goal in modern biostatistics is to form a biomarker signature from high dimensional gene expression data that is predictive of some outcome of interest. After learning this biomarker signature, an important question to answer is how well it predicts the response compared to classical predictors. This is challenging, because the biomarker signature is an internal predictor -- one that has been learned using the same dataset on which we want to evaluate it's significance. Read More

Inspired by sample splitting and the reusable holdout introduced in the field of differential privacy, we consider selective inference with a randomized response. We discuss two major advantages of using a randomized response for model selection. First, the selectively valid tests are more powerful after randomized selection. Read More

There has been much recent work on inference after model selection when the noise level is known, however, $\sigma$ is rarely known in practice and its estimation is difficult in high-dimensional settings. In this work we propose using the square-root LASSO (also known as the scaled LASSO) to perform selective inference for the coefficients and the noise level simultaneously. The square-root LASSO has the property that choosing a reasonable tuning parameter is scale-free, namely it does not depend on the noise level in the data. Read More

We devise a one-shot approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long as the dataset is not split across too many machines. Read More

Motivated by questions of manifold learning, we study a sequence of random manifolds, generated by embedding a fixed, compact manifold $M$ into Euclidean spheres of increasing dimension via a sequence of Gaussian mappings. One of the fundamental smoothness parameters of manifold learning theorems is the reach, or critical radius, of $M$. Roughly speaking, the reach is a measure of a manifold's departure from convexity, which incorporates both local curvature and global topology. Read More

In this paper, we seek to establish asymptotic results for selective inference procedures removing the assumption of Gaussianity. The class of selection procedures we consider are determined by affine inequalities, which we refer to as affine selection procedures. Examples of affine selection procedures include post-selection inference along the solution path of the LASSO, as well as post-selection inference after fitting the LASSO at a fixed value of the regularization parameter. Read More

Principal component analysis (PCA) is a well-known tool in multivariate statistics. One significant challenge in using PCA is the choice of the number of components. In order to address this challenge, we propose an exact distribution-based method for hypothesis testing and construction of confidence intervals for signals in a noisy matrix. Read More

To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. Read More

We introduce a consistent estimator for the homology (an algebraic structure representing connected components and cycles) of level sets of both density and regression functions. Our method is based on kernel estimation. We apply this procedure to two problems: (1) inferring the homology structure of manifolds from noisy observations, (2) inferring the persistent homology (a multi-scale extension of homology) of either density or regression functions. Read More

Rejoinder of "A significance test for the lasso" by Richard Lockhart, Jonathan Taylor, Ryan J. Tibshirani, Robert Tibshirani [arXiv:1301.7161]. Read More

We apply the methods developed by Lockhart et al. (2013) and Taylor et al. (2013) on significance tests for penalized regression to forward stepwise model selection. Read More

We tackle the problem of the estimation of a vector of means from a single vector-valued observation $y$. Whereas previous work reduces the size of the estimates for the largest (absolute) sample elements via shrinkage (like James-Stein) or biases estimated via empirical Bayes methodology, we take a novel approach. We adapt recent developments by Lee et al (2013) in post selection inference for the Lasso to the orthogonal setting, where sample elements have different underlying signal sizes. Read More

Two-step estimators often called upon to fit censored regression models in many areas of science and engineering. Since censoring incurs a bias in the naive least-squares fit, a two-step estimator first estimates the bias and then fits a corrected linear model. We develop a framework for performing valid /post-correction inference/ with two-step estimators. Read More

We develop a framework for post model selection inference, via marginal screening, in linear regression. At the core of this framework is a result that characterizes the exact distribution of linear functions of the response $y$, conditional on the model being selected (``condition on selection" framework). This allows us to construct valid confidence intervals and hypothesis tests for regression coefficients that account for the selection procedure. Read More

We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set. This framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path, because, as it turns out, selection events for these procedures can be expressed as polyhedral constraints on y. Read More

We develop a general approach to valid inference after model selection. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the approach to model selection by the lasso to form valid confidence intervals for the selected coefficients and test whether all relevant variables have been included in the model. Read More

It has been over 200 years since Gauss's and Legendre's famous priority dispute on who discovered the method of least squares. Nevertheless, we argue that the normal equations are still relevant in many facets of modern statistics, particularly in the domain of high-dimensional inference. Even today, we are still learning new things about the law of large numbers, first described in Bernoulli's Ars Conjectandi 300 years ago, as it applies to high dimensional inference. Read More

We introduce Gaussian-type measures on the manifold of all metrics with a fixed volume form on a compact Riemannian manifold of dimension $\geq 3$. For this random model we compute the characteristic function for the $L^2$ (Ebin) distance to the reference metric. In the Appendix, we study Lipschitz-type distance between Riemannian metrics and give applications to the diameter, eigenvalue and volume entropy functionals. Read More

We propose a theoretical framework to predict the three-dimensional shapes of optically deformed micron-sized emulsion droplets with ultra-low interfacial tension. The resulting shape and size of the droplet arises out of a balance between the interfacial tension and optical forces. Using an approximation of the laser field as a Gaussian beam, working within the Rayleigh-Gans regime and assuming isotropic surface energy at the oil-water interface, we numerically solve the resulting shape equations to elucidate the three-dimensional droplet geometry. Read More

We derive an exact p-value for testing a global null hypothesis in a general adaptive regression problem. The general approach uses the Kac-Rice formula, as described in (Adler & Taylor 2007). The resulting formula is exact in finite samples, requiring only Gaussianity of the errors. Read More

We consider tests of significance in the setting of the graphical lasso for inverse covariance matrix estimation. We propose a simple test statistic based on a subsequence of the knots in the graphical lasso path. We show that this statistic has an exponential asymptotic null distribution, under the null hypothesis that the model contains the true connected components. Read More

Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Usually the low-dimensional structure is encoded by the presence of the (unknown) parameters in some low-dimensional model subspace. In such settings, it is desirable for estimates of the model parameters to be \emph{model selection consistent}: the estimates also fall in the model subspace. Read More

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an $\operatorname {Exp}(1)$ asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i. Read More

Our problem is to find a good approximation to the P-value of the maximum of a random field of test statistics for a cone alternative at each point in a sample of Gaussian random fields. These test statistics have been proposed in the neuroscience literature for the analysis of fMRI data allowing for unknown delay in the hemodynamic response. However the null distribution of the maximum of this 3D random field of test statistics, and hence the threshold used to detect brain activation, was unsolved. Read More

We add a set of convex constraints to the lasso to produce sparse interaction models that honor the hierarchy restriction that an interaction only be included in a model if one or both variables are marginally important. We give a precise characterization of the effect of this hierarchy constraint, prove that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A bound on this estimate reveals the amount of fitting "saved" by the hierarchy constraint. Read More

We provide a new approach, along with extensions, to results in two important papers of Worsley, Siegmund and coworkers closely tied to the statistical analysis of fMRI (functional magnetic resonance imaging) brain data. These papers studied approximations for the exceedence probabilities of scale and rotation space random fields, the latter playing an important role in the statistical analysis of fMRI data. The techniques used there came either from the Euler characteristic heuristic or via tube formulae, and to a large extent were carefully attuned to the specific examples of the paper. Read More

We derive the degrees of freedom of the lasso fit, placing no assumptions on the predictor matrix $X$. Like the well-known result of Zou, Hastie and Tibshirani [Ann. Statist. Read More

Multivariate machine learning methods are increasingly used to analyze neuroimaging data, often replacing more traditional "mass univariate" techniques that fit data one voxel at a time. In the functional magnetic resonance imaging (fMRI) literature, this has led to broad application of "off-the-shelf" classification and regression methods. These generic approaches allow investigators to use ready-made algorithms to accurately decode perceptual, cognitive, or behavioral states from distributed patterns of neural activity. Read More

In this work we consider infinite dimensional extensions of some finite dimensional Gaussian geometric functionals called the Gaussian Minkowski functionals. These functionals appear as coefficients in the probability content of a tube around a convex set $D\subset\mathbb{R}^k$ under the standard Gaussian law $N(0,I_{k\times k})$. Using these infinite dimensional extensions, we consider geometric properties of some smooth random fields in the spirit of [Random Fields and Geometry (2007) Springer] that can be expressed in terms of reasonably smooth Wiener functionals. Read More

Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of the singular value decomposition (SVD) and principal components analysis (PCA) that is appropriate for massive data sets with structured variables or known two-way dependencies. Read More

We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui et al (2010) propose "SAFE" rules that guarantee that a coefficient will be zero in the solution, based on the inner products of each predictor with the outcome. In this paper we propose strong rules that are not foolproof but rarely fail in practice. Read More

We present a path algorithm for the generalized lasso problem. This problem penalizes the $\ell_1$ norm of a matrix D times the coefficient vector, and has a wide range of applications, dictated by the choice of D. Our algorithm is based on solving the dual of the generalized lasso, which greatly facilitates computation of the path. Read More

We consider smooth, infinitely divisible random fields $(X(t),t\in M)$, $M\subset {\mathbb{R}}^d$, with regularly varying Levy measure, and are interested in the geometric characteristics of the excursion sets \[A_u=\{t\in M:X(t)>u\}\] over high levels u. For a large class of such random fields, we compute the $u\to\infty$ asymptotic joint distribution of the numbers of critical points, of various types, of X in $A_u$, conditional on $A_u$ being nonempty. This allows us, for example, to obtain the asymptotic conditional distribution of the Euler characteristic of the excursion set. Read More

This article presents maximum likelihood estimators (MLEs) and log-likelihood ratio (LLR) tests for the eigenvalues and eigenvectors of Gaussian random symmetric matrices of arbitrary dimension, where the observations are independent repeated samples from one or two populations. These inference problems are relevant in the analysis of diffusion tensor imaging data and polarized cosmic background radiation data, where the observations are, respectively, $3\times3$ and $2\times2$ symmetric positive definite matrices. The parameter sets involved in the inference problems for eigenvalues and eigenvectors are subsets of Euclidean space that are either affine subspaces, embedded submanifolds that are invariant under orthogonal transformations or polyhedral convex cones. Read More