# An introduction to sampling via measure transport

We present the fundamentals of a measure transport approach to sampling. The idea is to construct a deterministic coupling---i.e., a transport map---between a complex "target" probability measure of interest and a simpler reference measure. Given a transport map, one can generate arbitrarily many independent and unweighted samples from the target simply by pushing forward reference samples through the map. We consider two different and complementary scenarios: first, when only evaluations of the unnormalized target density are available, and second, when the target distribution is known only through a finite collection of samples. We show that in both settings the desired transports can be characterized as the solutions of variational problems. We then address practical issues associated with the optimization--based construction of transports: choosing finite-dimensional parameterizations of the map, enforcing monotonicity, quantifying the error of approximate transports, and refining approximate transports by enriching the corresponding approximation spaces. Approximate transports can also be used to "Gaussianize" complex distributions and thus precondition conventional asymptotically exact sampling schemes. We place the measure transport approach in broader context, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation. We also sketch current work aimed at the construction of transport maps in high dimensions, exploiting essential features of the target distribution (e.g., conditional independence, low-rank structure). The approaches and algorithms presented here have direct applications to Bayesian computation and to broader problems of stochastic simulation.

**Comments:**To appear in Handbook of Uncertainty Quantification; R. Ghanem, D. Higdon, and H. Owhadi, editors; Springer, 2016

## Similar Publications

FRK is an R software package for spatial/spatio-temporal modelling and prediction with large datasets. It facilitates optimal spatial prediction (kriging) on the most commonly used manifolds (in Euclidean space and on the surface of the sphere), for both spatial and spatio-temporal fields. It differs from existing packages for spatial modelling and prediction by avoiding stationary and isotropic covariance and variogram models, instead constructing a spatial random effects (SRE) model on a discretised spatial domain. Read More

We describe and demonstrate the use of the R package acebayes to find Bayesian optimal experimental designs. A decision-theoretic approach is adopted, with the optimal design maximising an expected utility. Finding Bayesian optimal designs for realistic problems is challenging, as the expected utility is typically intractable and the design space may be high-dimensional. Read More

We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework Gambit. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. Read More

Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. Read More

We study Bayesian inference methods for solving linear inverse problems, focusing on hierarchical formulations where the prior or the likelihood function depend on unspecified hyperparameters. In practice, these hyperparameters are often determined via an empirical Bayesian method that maximizes the marginal likelihood function, i.e. Read More

Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Read More

In this manuscript the fixed-lag smoothing problem for conditionally linear Gaussian state-space models is investigated from a factor graph perspective. More specifically, after formulating Bayesian smoothing for an arbitrary state-space model as forward-backward message passing over a factor graph, we focus on the above mentioned class of models and derive a novel Rao-Blackwellized particle smoother for it. Then, we show how our technique can be modified to estimate a point mass approximation of the so called joint smoothing distribution. Read More

Efficiently aggregating data from different sources is a challenging problem, particularly when samples from each source are distributed differently. These differences can be inherent to the inference task or present for other reasons: sensors in a sensor network may be placed far apart, affecting their individual measurements. Conversely, it is computationally advantageous to split Bayesian inference tasks across subsets of data, but data need not be identically distributed across subsets. Read More

A pressing question in Bayesian statistics and machine learning is to introduce a unified theoretical framework that brings together some of the many statistical models and algorithmic methodologies employed by practitioners. In this paper we suggest that the variational formulation of the Bayesian update, coupled with the theory of gradient flows, provides both an overarching structure and a powerful tool for the analysis of many such models and algorithms. As particular instances of our general framework, we provide three variational formulations of the Bayesian update with three associated gradient flows that converge to the posterior. Read More

Linear discriminant analysis (LDA) is a classical method for dimensionality reduction, where discriminant vectors are sought to project data to a lower dimensional space for optimal separability of classes. Several recent papers have outlined strategies for exploiting sparsity for using LDA with high-dimensional data. However, many lack scalable methods for solution of the underlying optimization problems. Read More