Statistics - Computation Publications (50)


Statistics - Computation Publications

Bayesian inference for complex models is challenging due to the need to explore high-dimensional spaces and multimodality and standard Monte Carlo samplers can have difficulties effectively exploring the posterior. We introduce a general purpose rejection-free ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers. This is achieved by combining parallel tempering and an auxiliary variable move to exchange information between the chains. Read More

Convex sparsity-promoting regularizations are ubiquitous in modern statistical learning. By construction, they yield solutions with few non-zero coefficients, which correspond to saturated constraints in the dual optimization formulation. Working set (WS) strategies are generic optimization techniques that consist in solving simpler problems that only consider a subset of constraints, whose indices form the WS. Read More

In many modern settings, data are acquired iteratively over time, rather than all at once. Such settings are known as online, as opposed to offline or batch. We introduce a simple technique for online parameter estimation, which can operate in low memory settings, settings where data are correlated, and only requires a single inspection of the available data at each time period. Read More

This article proposes a new graphical tool, the magnitude-shape (MS) plot, for visualizing both the magnitude and shape outlyingness of multivariate functional data. The proposed tool builds on the recent notion of functional directional outlyingness, which measures the centrality of functional data by simultaneously considering the level and the direction of their deviation from the central region. The MS-plot intuitively presents not only levels but also directions of magnitude outlyingness on the horizontal axis or plane, and demonstrates shape outlyingness on the vertical axis. Read More

Kernel quadratures and other kernel-based approximation methods typically suffer from prohibitive cubic time and quadratic space complexity in the number of function evaluations. The problem arises because a system of linear equations needs to be solved. In this article we show that the weights of a kernel quadrature rule can be computed efficiently and exactly for up to tens of millions of nodes if the kernel, integration domain, and measure are fully symmetric and the node set is a union of fully symmetric sets. Read More

nimble is an R package for constructing algorithms and conducting inference on hierarchical models. The nimble package provides a unique combination of flexible model specification and the ability to program model-generic algorithms -- specifically, the package allows users to code models in the BUGS language, and it allows users to write algorithms that can be applied to any appropriately-specified BUGS model. In this paper, we introduce nimble's capabilities for state-space model analysis using Sequential Monte Carlo (SMC) techniques. Read More

Integration against an intractable probability measure is among the fundamental challenges of statistical inference, particularly in the Bayesian setting. A principled approach to this problem seeks a deterministic coupling of the measure of interest with a tractable "reference" measure (e.g. Read More

We study the convergence properties of the Gibbs Sampler in the context of posterior distributions arising from Bayesian analysis of Gaussian hierarchical models. We consider centred and non-centred parameterizations as well as their hybrids including the full family of partially non-centred parameterizations. We develop a novel methodology based on multi-grid decompositions to derive analytic expressions for the convergence rates of the algorithm for an arbitrary number of layers in the hierarchy, while previous work was typically limited to the two-level case. Read More

The marginal likelihood plays an important role in many areas of Bayesian statistics such as parameter estimation, model comparison, and model averaging. In most applications, however, the marginal likelihood is not analytically tractable and must be approximated using numerical methods. Here we provide a tutorial on bridge sampling (Bennett, 1976; Meng & Wong, 1996), a reliable and relatively straightforward sampling method that allows researchers to obtain the marginal likelihood for models of varying complexity. Read More

This study presents an innovative method for reducing the number of rating scale items without predictability loss. The "area under the re- ceiver operator curve method" (AUC ROC) is used to implement in the RatingScaleReduction package posted on CRAN. Several cases have been used to illustrate how the stepwise method has reduced the number of rating scale items (variables). Read More

Bayesian optimal experimental design has immense potential to inform the collection of data, so as to subsequently enhance our understanding of a variety of processes. However, a major impediment is the difficulty in evaluating optimal designs for problems with large, or high-dimensional, design spaces. We propose an efficient search heuristic suitable for general optimisation problems, with a particular focus on optimal Bayesian experimental design problems. Read More

Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable inferences. One of the major challenges in phylogenetics regards the estimation of the marginal likelihood. Read More

The Bergm package provides a comprehensive framework for Bayesian inference using Markov chain Monte Carlo (MCMC) algorithms. It can also supply graphical Bayesian goodness-of-fit procedures that address the issue of model adequacy. The package is simple to use and represents an attractive way of analysing network data as it offers the advantage of a complete probabilistic treatment of uncertainty. Read More

We develop an online learning method for prediction, which is important in problems with large and/or streaming data sets. We formulate the learning approach using a covariance-fitting methodology, and show that the resulting predictor has desirable computational and distribution-free properties: It is implemented online with a runtime that scales linearly in the number of samples; has a constant memory requirement; avoids local minima problems; and prunes away redundant feature dimensions without relying on restrictive assumptions on the data distribution. In conjunction with the split conformal approach, it also produces distribution-free prediction confidence intervals in a computationally efficient manner. Read More

In this article we develop a new sequential Monte Carlo (SMC) method for multilevel (ML) Monte Carlo estimation. In particular, the method can be used to estimate expectations with respect to a target probability distribution over an infinite-dimensional and non-compact space as given, for example, by a Bayesian inverse problem with Gaussian random field prior. Under suitable assumptions the MLSMC method has the optimal $O(\epsilon^{-2})$ bound on the cost to obtain a mean-square error of $O(\epsilon^2)$. Read More

The objective of this study is illustrating how to use "spmoran," which is an R package for Moran's eigenvector-based spatial regression analysis. spmoran estimates regression models in the presence of spatial dependence, including eigenvector spatial filtering (ESF) and random effects ESF (RE-ESF) models. These models are allowed to have spatially varying coefficients to capture spatial heterogeneity. Read More

The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. Read More

Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets typically have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we develop a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. Read More

Probabilistic integration of a continuous dynamical system is a way of systematically introducing model error, at scales no larger than errors inroduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. Read More

Queue networks describe complex stochastic systems of both theoretical and practical interest. They provide the means to assess alterations, diagnose poor performance and evaluate robustness across sets of interconnected resources. In the present paper, we focus on the underlying continuous-time Markov chains induced by these networks, and we present a flexible method for drawing parameter inference in multi-class Markovian cases with switching and different service disciplines. Read More

Changepoint detection is a central problem in time series and genomic data. For some applications, it is natural to impose constraints on the directions of changes. One example is ChIP-seq data, for which adding an up-down constraint improves peak detection accuracy, but makes the optimization problem more complicated. Read More

This paper presents a new approach for the optimization of GARCH parameters estimation. Firstly, we propose a method for the localization of the maximum. Thereafter, using the methods of least squares, we make a local approximation for the projection of the likelihood function curve on two dimensional planes by a polynomial of order two which will be used to calculate an estimation of the maximum. Read More

Given matrices $X,Y \in R^{n \times K}$ and $S \in R^{K \times K}$ with positive elements, this paper proposes an algorithm fastRG to sample a sparse matrix $A$ with low rank expectation $E(A) = XSY^T$ and independent Poisson elements. This allows for quickly sampling from a broad class of stochastic blockmodel graphs (degree-corrected, mixed membership, overlapping) all of which are specific parameterizations of the generalized random product graph model defined in Section 2.2. Read More

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. Read More

State estimation in heavy-tailed process and measurement noise is an important challenge that must be addressed in, e.g., tracking scenarios with agile targets and outlier-corrupted measurements. Read More

Probabilistic modeling provides the capability to represent and manipulate uncertainty in data, models, decisions and predictions. We are concerned with the problem of learning probabilistic models of dynamical systems from measured data. Specifically, we consider learning of probabilistic nonlinear state space models. Read More

To infer the parameters of mechanistic models with intractable likelihoods, techniques such as approximate Bayesian computation (ABC) are increasingly being adopted. One of the main disadvantages of ABC in practical situations, however, is that parameter inference must generally rely on summary statistics of the data. This is particularly the case for problems involving high-dimensional data, such as biological imaging experiments. Read More

Mean shift (MS) algorithms are popular methods for mode finding in pattern analysis. Each MS algorithm can be phrased as a fixed-point iteration scheme, which operates on a kernel density estimate (KDE) based on some data. The ability of an MS algorithm to obtain the modes of its KDE depends on whether or not the fixed-point scheme converges. Read More

We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed data. Read More

We study the maximum likelihood degree (ML degree) of toric varieties, known as discrete exponential models in statistics. By introducing scaling coefficients to the monomial parameterization of the toric variety, one can change the ML degree. We show that the ML degree is equal to the degree of the toric variety for generic scalings, while it drops if and only if the scaling vector is in the locus of the principal $A$-determinant. Read More

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation (C-TMLE) procedure. Read More

Robust clustering from incomplete data is an important topic because, in many practical situations, real data sets are heavy-tailed, asymmetric, and/or have arbitrary patterns of missing observations. Flexible methods and algorithms for model-based clustering are presented via mixture of the generalized hyperbolic distributions and its limiting case, the mixture of multivariate skew-t distributions. An analytically feasible EM algorithm is formulated for parameter estimation and imputation of missing values for mixture models employing missing at random mechanisms. Read More

Large networks of queueing systems model important real-world systems such as MapReduce clusters, web-servers, hospitals, call-centers and airport passenger terminals.To model such systems accurately we must infer queueing parameters from data. Unfortunately, for many queueing networks there is no clear way to proceed with parameter inference from data. Read More

We consider the problem where we have a multi-way table of means, indexed by several factors, where each factor can have a large number of levels. The entry in each cell is the mean of some response, averaged over the observations falling into that cell. Some cells may be very sparsely populated, and in extreme cases, not populated at all. Read More

Early and accurate identification of parkinsonian syndromes (PS) involving presynaptic degeneration from non-degenerative variants such as Scans Without Evidence of Dopaminergic Deficit (SWEDD) and tremor disorders, is important for effective patient management as the course, therapy and prognosis differ substantially between the two groups. In this study, we use Single Photon Emission Computed Tomography (SPECT) images from healthy normal, early PD and SWEDD subjects, as obtained from the Parkinson's Progression Markers Initiative (PPMI) database, and process them to compute shape- and surface fitting-based features for the three groups. We use these features to develop and compare various classification models that can discriminate between scans showing dopaminergic deficit, as in PD, from scans without the deficit, as in healthy normal or SWEDD. Read More

We study recovery of piecewise-constant signals over arbitrary graphs by the estimator minimizing an $l_0$-edge-penalized objective. Although exact minimization of this objective may be computationally intractable, we show that the same statistical risk guarantees are achieved by the alpha-expansion algorithm which approximately minimizes this objective in polynomial time. We establish that for graphs with small average vertex degree, these guarantees are rate-optimal in a minimax sense over classes of edge-sparse signals. Read More

Integrated Nested Laplace Approximation provides a fast and effective method for marginal inference on Bayesian hierarchical models. This methodology has been implemented in the R-INLA package which permits INLA to be used from within R statistical software. Although INLA is implemented as a general methodology, its use in practice is limited to the models implemented in the R-INLA package. Read More

We harness the power of Bayesian emulation techniques, designed to aid the analysis of complex computer models, to examine the structure of complex Bayesian analyses themselves. These techniques facilitate robust Bayesian analyses and/or sensitivity analyses of complex problems, and hence allow global exploration of the impacts of choices made in both the likelihood and prior specification. We show how previously intractable problems in robustness studies can be overcome using emulation techniques, and how these methods allow other scientists to quickly extract approximations to posterior results corresponding to their own particular subjective specification. Read More

Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects. Differential privacy (DP) has become established as a standard for protecting learning results, but the proposed algorithms require a single trusted party to have access to the entire data, which is a clear weakness. We consider DP Bayesian learning in a distributed setting, where each party only holds a single sample or a few samples of the data. Read More

We present a general class of embeddings based on structured random matrices with orthogonal rows which can be applied in many machine learning applications including dimensionality reduction, kernel approximation and locality-sensitive hashing. We show that this class yields improvements over previous state-of-the-art methods either in computational efficiency (while providing similar accuracy) or in accuracy, or both. In particular, we propose the \textit{Orthogonal Johnson-Lindenstrauss Transform} (OJLT) which is as fast as earlier methods yet provably outperforms them in terms of accuracy, leading to a `free lunch' improvement over previous dimensionality reduction mechanisms. Read More

Since the cost of installing and maintaining sensors is usually high, sensor locations are always strategically selected. For those aiming at inferring certain quantities of interest (QoI), it is desirable to explore the dependency between sensor measurements and QoI. One of the most popular metric for the dependency is mutual information which naturally measures how much information about one variable can be obtained given the other. Read More

Implicit probabilistic models are a flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. Read More

This paper proposes an efficient implementation of the multi-sensor generalized labeled multi-Bernoulli (GLMB) filter. The solution exploits the GLMB joint prediction and update together with a new technique for truncating the GLMB filtering density based on Gibbs sampling. The resulting algorithm has quadratic complexity in the number of hypothesized object and linear in the number of measurements of each individual sensors. Read More

Bayesian statistical models allow us to formalise our knowledge about the world and reason about our uncertainty, but there is a need for better procedures to accurately encode its complexity. One way to do so is through compositional models, which are formed by combining blocks consisting of simpler models. One can increase the complexity of the compositional model by either stacking more blocks or by using a not-so-simple model as a building block. Read More

We describe a Markov chain Monte Carlo method to approximately simulate a centered d-dimensional Gaussian vector X with given covariance matrix. The standard Monte Carlo method is based on the Cholesky decomposition, which takes cubic time and has quadratic storage cost in d. In contrast, the storage cost of our algorithm is linear in d. Read More

We develop a confidence interval index for comparing confidence interval estimators based on the confidence interval length and coverage probability. We show that the confidence interval index has range of values within the neighborhood of the range of the coverage probability, [0,1]. In addition, a good confidence interval estimator is shown to have an index value approaching 1; and a bad confidence interval has an index value approaching 0. Read More

This paper considers Event-Chain Monte Carlo simulation schemes in order to design an original irreversible Markov Chain Monte Carlo (MCMC) algorithm for the sampling of complex statistical models. The functioning principles of MCMC sampling methods are firstly recalled, as well as standard Event-Chain Monte Carlo simulation schemes are described. Then, a Forward Event-Chain Monte Carlo sampling methodology is proposed and introduced. Read More

A method for the introduction of second-order derivatives of the log likelihood into HMC algorithms is introduced, which does not require the Hessian to be evaluated at each leapfrog step but only at the start and end of trajectories. Read More

Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of coresets called lightweight coresets that allows for both multiplicative and additive errors. Read More