# Gradient-based stochastic optimization methods in Bayesian experimental design

Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue OED for nonlinear systems from a Bayesian perspective, with the goal of choosing experiments that are optimal for parameter inference. Our objective in this context is the expected information gain in model parameters, which in general can only be estimated using Monte Carlo methods. Maximizing this objective thus becomes a stochastic optimization problem. This paper develops gradient-based stochastic optimization methods for the design of experiments on a continuous parameter space. Given a Monte Carlo estimator of expected information gain, we use infinitesimal perturbation analysis to derive gradients of this estimator. We are then able to formulate two gradient-based stochastic optimization approaches: (i) Robbins-Monro stochastic approximation, and (ii) sample average approximation combined with a deterministic quasi-Newton method. A polynomial chaos approximation of the forward model accelerates objective and gradient evaluations in both cases. We discuss the implementation of these optimization methods, then conduct an empirical comparison of their performance. To demonstrate design in a nonlinear setting with partial differential equation forward models, we use the problem of sensor placement for source inversion. Numerical results yield useful guidelines on the choice of algorithm and sample sizes, assess the impact of estimator bias, and quantify tradeoffs of computational cost versus solution quality and robustness.

**Comments:**Preprint 40 pages, 10 figures (121 small figures). v1 submitted to the International Journal for Uncertainty Quantification on December 10, 2012; v2 submitted on September 10, 2013. v2 changes: (a) clarified algorithm stopping criteria and other parameters; (b) emphasized paper contributions, plus other minor edits; v3 submitted on December 26, 2014. v3 changes: minor edits

## Similar Publications

**Affiliations:**

^{1}University of California Davis,

^{2}University of Duisburg-Essen

**Category:**Statistics - Computation

In this note we describe experiments on an implementation of two methods proposed in the literature for computing regions that correspond to a notion of order statistics for multidimensional data. Our implementation, which works for any dimension greater than one, is the only that we know of to be publicly available. Experiments run using the software confirm that half-space peeling generally gives better results than directly peeling convex hulls, but at a computational cost. Read More

In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. Read More

BDSAR is an R package which estimates distances between probability distributions and facilitates a dynamic and powerful analysis of diagnostics for Bayesian models from the class of Simultaneous Autoregressive (SAR) spatial models. The package offers a new and fine plot to compare models as well as it works in an intuitive way to allow any analyst to easily build fine plots. These are helpful to promote insights about influential observations in the data. Read More

This article reviews the application of advanced Monte Carlo techniques in the context of Multilevel Monte Carlo (MLMC). MLMC is a strategy employed to compute expectations which can be biased in some sense, for instance, by using the discretization of a associated probability law. The MLMC approach works with a hierarchy of biased approximations which become progressively more accurate and more expensive. Read More

The scalable calculation of matrix determinants has been a bottleneck to the widespread application of many machine learning methods such as determinantal point processes, Gaussian processes, generalised Markov random fields, graph models and many others. In this work, we estimate log determinants under the framework of maximum entropy, given information in the form of moment constraints from stochastic trace estimation. The estimates demonstrate a significant improvement on state-of-the-art alternative methods, as shown on a wide variety of UFL sparse matrices. Read More

The ensemble Kalman filter (EnKF) is a computational technique for approximate inference on the state vector in spatio-temporal state-space models. It has been successfully used in many real-world nonlinear data-assimilation problems with very high dimensions, such as weather forecasting. However, the EnKF is most appropriate for additive Gaussian state-space models with linear observation equation and without unknown parameters. Read More

Lorentz Transmission Electron Microscopy (TEM) observations of magnetic nanoparticles contain information on the magnetic and electrostatic potentials. Vector Field Electron Tomography (VFET) can be used to reconstruct electromagnetic potentials of the nanoparticles from their corresponding LTEM images. The VFET approach is based on the conventional filtered back projection approach to tomographic reconstructions and the availability of an incomplete set of measurements due to experimental limitations means that the reconstructed vector fields exhibit significant artifacts. Read More

A new recalibration post-processing method is presented to improve the quality of the posterior approximation when using Approximate Bayesian Computation (ABC) algorithms. Recalibration may be used in conjunction with existing post-processing methods, such as regression-adjustments. In addition, this work extends and strengthens the links between ABC and indirect inference algorithms, allowing more extensive use of misspecified auxiliary models in the ABC context. Read More

In the quest for scalable Bayesian computational algorithms we need to exploit the full potential of existing methodologies. In this note we point out that message passing algorithms, which are very well developed for inference in graphical models, appear to be largely unexplored for scalable inference in Bayesian multilevel regression models. We show that nested multilevel regression models with Gaussian errors lend themselves very naturally to the combined use of belief propagation and MCMC. Read More

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two ubiquitous growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are standard statistical packages for estimating the structural properties of complex networks, there is no corresponding package when it comes to the estimation of growth mechanisms. Read More