# Simulation-based optimal Bayesian experimental design for nonlinear systems

The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general mathematical framework and an algorithmic approach for optimal experimental design with nonlinear simulation-based models; in particular, we focus on finding sets of experiments that provide the most information about targeted sets of parameters. Our framework employs a Bayesian statistical setting, which provides a foundation for inference from noisy, indirect, and incomplete data, and a natural mechanism for incorporating heterogeneous sources of information. An objective function is constructed from information theoretic measures, reflecting expected information gain from proposed combinations of experiments. Polynomial chaos approximations and a two-stage Monte Carlo sampling method are used to evaluate the expected information gain. Stochastic approximation algorithms are then used to make optimization feasible in computationally intensive and high-dimensional settings. These algorithms are demonstrated on model problems and on nonlinear parameter estimation problems arising in detailed combustion kinetics.

**Comments:**Preprint 53 pages, 17 figures (54 small figures). v1 submitted to the Journal of Computational Physics on August 4, 2011; v2 submitted on August 12, 2012. v2 changes: (a) addition of Appendix B and Figure 17 to address the bias in the expected utility estimator; (b) minor language edits; v3 submitted on November 30, 2012. v3 changes: minor edits

## Similar Publications

We give convergence guarantees for estimating the coefficients of a symmetric mixture of two linear regressions by expectation maximization (EM). In particular, if the initializer has a large cosine angle with the population coefficient vector and the signal to noise ratio (SNR) is large, a sample-splitting version of the EM algorithm converges to the true coefficient vector with high probability. Here "large" means that each quantity is required to be at least a universal constant. Read More

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e.g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014. Read More

This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Read More

**Authors:**Arnaud Joly

Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. Read More

The $\ell_1$ regularized sparse model has been favourably used in machine learning society. Due to the non-smoothness, fast optimizers like quasi-Newton methods can not be directly applied. In this paper, we propose the first stochastic limited-memory quasi-newton optimizer that specializing in strongly convex loss function with $\ell_1$-regularization. Read More

Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$) and assume that the high-dimensional parameters vector is $s_0$ sparse. We develop a general and flexible $\ell_\infty$ projection statistic for hypothesis testing in this model. Read More

In this paper, we study the stochastic gradient descent (SGD) method for the nonconvex nonsmooth optimization, and propose an accelerated SGD method by combining the variance reduction technique with Nesterov's extrapolation technique. Moreover, based on the local error bound condition, we establish the linear convergence of our method to obtain a stationary point of the nonconvex optimization. In particular, we prove that not only the sequence generated linearly converges to a stationary point of the problem, but also the corresponding sequence of objective values is linearly convergent. Read More

We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends' activity, thus providing side information about each other's preferences. Read More

Our goal is to learn a semantic parser that maps natural language utterances into executable programs when only indirect supervision is available: examples are labeled with the correct execution result, but not the program itself. Consequently, we must search the space of programs for those that output the correct result, while not being misled by spurious programs: incorrect programs that coincidentally output the correct result. We connect two common learning paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML), and then present a new learning algorithm that combines the strengths of both. Read More

Motivated by machine learning applications in networks of sensors, internet-of-things (IoT) devices, and autonomous agents, we propose techniques for distributed stochastic convex learning from high-rate data streams. The setup involves a network of nodes---each one of which has a stream of data arriving at a constant rate---that solve a stochastic convex optimization problem by collaborating with each other over rate-limited communication links. To this end, we present and analyze two algorithms---termed distributed stochastic approximation mirror descent (D-SAMD) and {\em accelerated} distributed stochastic approximation mirror descent (AD-SAMD)---that are based on two stochastic variants of mirror descent. Read More