An introduction to sampling via measure transport

We present the fundamentals of a measure transport approach to sampling. The idea is to construct a deterministic coupling---i.e., a transport map---between a complex "target" probability measure of interest and a simpler reference measure. Given a transport map, one can generate arbitrarily many independent and unweighted samples from the target simply by pushing forward reference samples through the map. We consider two different and complementary scenarios: first, when only evaluations of the unnormalized target density are available, and second, when the target distribution is known only through a finite collection of samples. We show that in both settings the desired transports can be characterized as the solutions of variational problems. We then address practical issues associated with the optimization--based construction of transports: choosing finite-dimensional parameterizations of the map, enforcing monotonicity, quantifying the error of approximate transports, and refining approximate transports by enriching the corresponding approximation spaces. Approximate transports can also be used to "Gaussianize" complex distributions and thus precondition conventional asymptotically exact sampling schemes. We place the measure transport approach in broader context, describing connections with other optimization--based samplers, with inference and density estimation schemes using optimal transport, and with alternative transformation--based approaches to simulation. We also sketch current work aimed at the construction of transport maps in high dimensions, exploiting essential features of the target distribution (e.g., conditional independence, low-rank structure). The approaches and algorithms presented here have direct applications to Bayesian computation and to broader problems of stochastic simulation.

Comments: To appear in Handbook of Uncertainty Quantification; R. Ghanem, D. Higdon, and H. Owhadi, editors; Springer, 2016

Similar Publications

This article reviews the application of advanced Monte Carlo techniques in the context of Multilevel Monte Carlo (MLMC). MLMC is a strategy employed to compute expectations which can be biased in some sense, for instance, by using the discretization of a associated probability law. The MLMC approach works with a hierarchy of biased approximations which become progressively more accurate and more expensive. Read More

The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. Read More

The ensemble Kalman filter (EnKF) is a computational technique for approximate inference on the state vector in spatio-temporal state-space models. It has been successfully used in many real-world nonlinear data-assimilation problems with very high dimensions, such as weather forecasting. However, the EnKF is most appropriate for additive Gaussian state-space models with linear observation equation and without unknown parameters. Read More

Lorentz Transmission Electron Microscopy (TEM) observations of magnetic nanoparticles contain information on the magnetic and electrostatic potentials. Vector Field Electron Tomography (VFET) can be used to reconstruct electromagnetic potentials of the nanoparticles from their corresponding LTEM images. The VFET approach is based on the conventional filtered back projection approach to tomographic reconstructions and the availability of an incomplete set of measurements due to experimental limitations means that the reconstructed vector fields exhibit significant artifacts. Read More

A new recalibration post-processing method is presented to improve the quality of the posterior approximation when using Approximate Bayesian Computation (ABC) algorithms. Recalibration may be used in conjunction with existing post-processing methods, such as regression-adjustments. In addition, this work extends and strengthens the links between ABC and indirect inference algorithms, allowing more extensive use of misspecified auxiliary models in the ABC context. Read More

In the quest for scalable Bayesian computational algorithms we need to exploit the full potential of existing methodologies. In this note we point out that message passing algorithms, which are very well developed for inference in graphical models, appear to be largely unexplored for scalable inference in Bayesian multilevel regression models. We show that nested multilevel regression models with Gaussian errors lend themselves very naturally to the combined use of belief propagation and MCMC. Read More

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two ubiquitous growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are standard statistical packages for estimating the structural properties of complex networks, there is no corresponding package when it comes to the estimation of growth mechanisms. Read More

In this paper, we propose a new method for estimation and constructing confidence intervals for low-dimensional components in a high-dimensional model. The proposed estimator, called Constrained Lasso (CLasso) estimator, is obtained by simultaneously solving two estimating equations---one imposing a zero-bias constraint for the low-dimensional parameter and the other forming an $\ell_1$-penalized procedure for the high-dimensional nuisance parameter. By carefully choosing the zero-bias constraint, the resulting estimator of the low dimensional parameter is shown to admit an asymptotically normal limit attaining the Cram\'{e}r-Rao lower bound in a semiparametric sense. Read More

In this work we define log-linear models to compare several square contingency tables under the quasi-independence or the quasi-symmetry model, and the relevant Markov bases are theoretically characterized. Through Markov bases, an exact test to evaluate if two or more tables fit a common model is introduced. Two real-data examples illustrate the use of these models in different fields of applications. Read More

There has been great interest recently in applying nonparametric kernel mixtures in a hierarchical manner to model multiple related data samples jointly. In such settings several data features are commonly present: (i) the related samples often share some, if not all, of the mixture components but with differing weights, (ii) only some, not all, of the mixture components vary across the samples, and (iii) often the shared mixture components across samples are not aligned perfectly in terms of their location and spread, but rather display small misalignments either due to systematic cross-sample difference or more often due to uncontrolled, extraneous causes. Properly incorporating these features in mixture modeling will enhance the efficiency of inference, whereas ignoring them not only reduces efficiency but can jeopardize the validity of the inference due to issues such as confounding. Read More