# Ole Winther - DTU Compute, Technical University of Denmark, Denmark

## Contact Details

NameOle Winther |
||

AffiliationDTU Compute, Technical University of Denmark, Denmark |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesStatistics - Machine Learning (14) Computer Science - Learning (8) Computer Science - Information Theory (5) Mathematics - Information Theory (5) Physics - Disordered Systems and Neural Networks (2) Computer Science - Neural and Evolutionary Computing (2) Statistics - Computation (2) Quantitative Biology - Quantitative Methods (2) Computer Science - Computer Vision and Pattern Recognition (2) Computer Science - Computation and Language (2) Computer Science - Artificial Intelligence (2) Quantitative Biology - Biomolecules (1) Computer Science - Digital Libraries (1) Computer Science - Information Retrieval (1) Statistics - Methodology (1) Physics - Data Analysis; Statistics and Probability (1) |

## Publications Authored By Ole Winther

We introduce a theoretical approach for designing generalizations of the approximate message passing (AMP) algorithm for compressed sensing which are valid for large observation matrices that are drawn from an invariant random matrix ensemble. By design, the fixed points of the algorithm obey the Thouless-Anderson-Palmer (TAP) equations corresponding to the ensemble. Using a dynamical functional approach we are able to derive an effective stochastic process for the marginal statistics of a single component of the dynamics. Read More

Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster-aware Generative Model, that uses unlabelled information to infer a latent representation that models the natural clustering of the data, and additional labelled data points to refine this clustering. Read More

Nanocavity lasers, which are an integral part of an on-chip integrated photonic network, are setting stringent requirements on the sensitivity of the techniques used to characterize the laser performance. Current characterization tools cannot provide detailed knowledge about nanolaser noise and dynamics. In this progress article, we will present tools and concepts from the Bayesian machine learning and digital coherent detection that offer novel approaches for highly-sensitive laser noise characterization and inference of laser dynamics. Read More

Most existing Neural Machine Translation models use groups of characters or whole words as their unit of input and output. We propose a model with a hierarchical char2word encoder, that takes individual characters both as input and output. We first argue that this hierarchical representation of the character encoder reduces computational complexity, and show that it improves translation performance. Read More

We investigate the problem of approximate Bayesian inference for a general class of observation models by means of the expectation propagation (EP) framework for large systems under some statistical assumptions. Our approach tries to overcome the numerical bottleneck of EP caused by the inversion of large matrices. Assuming that the measurement matrices are realizations of specific types of ensembles we use the concept of freeness from random matrix theory to show that the EP cavity variances exhibit an asymptotic self-averaging property. Read More

How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling. Read More

The estimation of normalizing constants is a fundamental step in probabilistic model comparison. Sequential Monte Carlo methods may be used for this task and have the advantage of being inherently parallelizable. However, the standard choice of using a fixed number of particles at each iteration is suboptimal because some steps will contribute disproportionately to the variance of the estimate. Read More

Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Read More

Variational Autoencoders are powerful models for unsupervised learning. However deep models with several layers of dependent stochastic variables are difficult to train which limits the improvements obtained using these highly expressive models. We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. Read More

We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e. Read More

We integrate the recently proposed spatial transformer network (SPN) [Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST sequences. Read More

In this work we address the problem of solving a series of underdetermined linear inverse problems subject to a sparsity constraint. We generalize the spike and slab prior distribution to encode a priori correlation of the support of the solution in both space and time by imposing a transformed Gaussian process on the spike and slab probabilities. An expectation propagation (EP) algorithm for posterior inference under the proposed model is derived. Read More

We consider the problem of solving TAP mean field equations by iteration for Ising model with coupling matrices that are drawn at random from general invariant ensembles. We develop an analysis of iterative algorithms using a dynamical functional approach that in the thermodynamic limit yields an effective dynamics of a single variable trajectory. Our main novel contribution is the expression for the implicit memory term of the dynamics for general invariant ensembles. Read More

We are interested in solving the multiple measurement vector (MMV) problem for instances, where the underlying sparsity pattern exhibit spatio-temporal structure motivated by the electroencephalogram (EEG) source localization problem. We propose a probabilistic model that takes this structure into account by generalizing the structured spike and slab prior and the associated Expectation Propagation inference scheme. Based on numerical experiments, we demonstrate the viability of the model and the approximate inference scheme. Read More

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. Read More

Recently we extended Approximate message passing (AMP) algorithm to be able to handle general invariant matrix ensembles. In this contribution we extend our S-AMP approach to non-linear observation models. We obtain generalized AMP (GAMP) algorithm as the special case when the measurement matrix has zero-mean iid Gaussian entries. Read More

Applying traditional collaborative filtering to digital publishing is challenging because user data is very sparse due to the high volume of documents relative to the number of users. Content based approaches, on the other hand, is attractive because textual content is often very informative. In this paper we describe large-scale content based collaborative filtering for digital publishing. Read More

Prediction of protein secondary structure from the amino acid sequence is a classical bioinformatics problem. Common methods use feed forward neural networks or SVMs combined with a sliding window, as these models does not naturally handle sequential data. Recurrent neural networks are an generalization of the feed forward neural network that naturally handle sequential data. Read More

The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Read More

We present a novel, scalable and Bayesian approach to modelling the occurrence of pairs of symbols (i,j) drawn from a large vocabulary. Observed pairs are assumed to be generated by a simple popularity based selection process followed by censoring using a preference function. By basing inference on the well-founded principle of variational bounding, and using new site-independent bounds, we show how a scalable inference procedure can be obtained for large data sets. Read More

In this work we propose a novel iterative estimation algorithm for linear observation systems called S-AMP whose fixed points are the stationary points of the exact Gibbs free energy under a set of (first- and second-) moment consistency constraints in the large system limit. S-AMP extends the approximate message-passing (AMP) algorithm to general matrix ensembles. The generalization is based on the S-transform (in free probability) of the spectrum of the measurement matrix. Read More

**Authors:**Radu Dragusin

^{1}, Paula Petcu

^{2}, Christina Lioma

^{3}, Birger Larsen

^{4}, Henrik L. Jørgensen

^{5}, Ingemar J. Cox

^{6}, Lars Kai Hansen

^{7}, Peter Ingwersen

^{8}, Ole Winther

^{9}

**Affiliations:**

^{1}DTU Compute, Technical University of Denmark, Denmark,

^{2}DTU Compute, Technical University of Denmark, Denmark,

^{3}DTU Compute, Technical University of Denmark, Denmark,

^{4}Information Systems and Interaction Design, Royal School of Library and Information Science, Copenhagen, Denmark,

^{5}Department of Clinical Biochemistry, Bispebjerg Hospital, Copenhagen, Denmark,

^{6}DTU Compute, Technical University of Denmark, Denmark,

^{7}DTU Compute, Technical University of Denmark, Denmark,

^{8}Information Systems and Interaction Design, Royal School of Library and Information Science, Copenhagen, Denmark,

^{9}DTU Compute, Technical University of Denmark, Denmark

Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Read More

Expectation Propagation (EP) provides a framework for approximate inference. When the model under consideration is over a latent Gaussian field, with the approximation being Gaussian, we show how these approximations can systematically be corrected. A perturbative expansion is made of the exact but intractable correction, and can be applied to the model's partition function and other moments of interest. Read More

We propose an active set selection framework for Gaussian process classification for cases when the dataset is large enough to render its inference prohibitive. Our scheme consists of a two step alternating procedure of active set update rules and hyperparameter optimization based upon marginal likelihood maximization. The active set update rules rely on the ability of the predictive distributions of a Gaussian process classifier to estimate the relative contribution of a datapoint when being either included or removed from the model. Read More

In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component delta-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. Read More

A new general algorithm for optimization of potential functions for protein folding is introduced. It is based upon gradient optimization of the thermodynamic stability of native folds of a training set of proteins with known structure. The iterative update rule contains two thermodynamic averages which are estimated by (generalized ensemble) Monte Carlo. Read More

We develop an advanced mean field method for approximating averages in probabilistic data models that is based on the TAP approach of disorder physics. In contrast to conventional TAP, where the knowledge of the distribution of couplings between the random variables is required, our method adapts to the concrete couplings. We demonstrate the validity of our approach, which is sofar restricted to models with non-glassy behaviour, by replica calculations for a wide class of models as well as by simulations for a real data set. Read More