# Statistics - Theory Publications (50)

## Search

## Statistics - Theory Publications

Particle filters are a popular and flexible class of numerical algorithms to solve a large class of nonlinear filtering problems. However, standard particle filters with importance weights have been shown to require a sample size that increases exponentially with the dimension D of the state space in order to achieve a certain performance, which precludes their use in very high-dimensional filtering problems. Here, we focus on the dynamic aspect of this curse of dimensionality (COD) in continuous time filtering, which is caused by the degeneracy of importance weights over time. Read More

We consider the statistical inverse problem to recover $f$ from noisy measurements $Y = Tf + \sigma \xi$ where $\xi$ is Gaussian white noise and $T$ a compact operator between Hilbert spaces. Considering general reconstruction methods of the form $\hat f_\alpha = q_\alpha \left(T^*T\right)T^*Y$ with an ordered filter $q_\alpha$, we investigate the choice of the regularization parameter $\alpha$ by minimizing an unbiased estimate of the predictive risk $\mathbb E\left[\Vert Tf - T\hat f_\alpha\Vert^2\right]$. The corresponding parameter $\alpha_{\mathrm{pred}}$ and its usage are well-known in the literature, but oracle inequalities and optimality results in this general setting are unknown. Read More

We study detection methods for multivariable signals under dependent noise. The main focus is on three-dimensional signals, i.e. Read More

**Affiliations:**

^{1}LMRS

We propose an objective prior distribution on correlation kernel parameters for Simple Kriging models in the spirit of reference priors. Because it is proper and defined through its conditional densities, it and its associated posterior distribution lend themselves well to Gibbs sampling, thus making the full-Bayesian procedure tractable. Numerical examples show it has near-optimal frequentist performance in terms of prediction interval coverage Read More

In this work, nonparametric statistical inference is provided for the continuous-time M/G/1 queueing model from a Bayesian point of view. The inference is based on observations of the inter-arrival and service times. Beside other characteristics of the system, particular interest is in the waiting time distribution which is not accessible in closed form. Read More

This paper studies the minimum distance estimation problem for panel data model. We propose the minimum distance estimators of regression parameters of the panel data model and investigate their asymptotic distributions. This paper contains two main contributions. Read More

We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arise in various applications, including detection of treatment effects, trend detection in econometrics, signal detection in radar processing, and shape-constrained inference in non-parametric statistics. We provide a sharp characterization of the GLRT testing radius up to a universal multiplicative constant in terms of the geometric structure of the underlying convex cones. Read More

Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional (i.e. Read More

Decision-makers often learn by acquiring information from distinct sources that possibly provide complementary information. We consider a decision-maker who sequentially samples from a finite set of Gaussian signals, and wants to predict a persistent multi-dimensional state at an unknown final period. What signal should he choose to observe in each period? Related problems about optimal experimentation and dynamic learning tend to have solutions that can only be approximated or implicitly characterized. Read More

When dealing with the problem of simultaneously testing a large number of null hypotheses, a natural testing strategy is to first reduce the number of tested hypotheses by some selection (screening or filtering) process, and then to simultaneously test the selected hypotheses. The main advantage of this strategy is to greatly reduce the severe effect of high dimensions. However, the first screening or selection stage must be properly accounted for in order to maintain some type of error control. Read More

A significant literature has arisen to study ways to employing prior knowledge to improve power and precision of multiple testing procedures. Some common forms of prior knowledge may include (a) a priori beliefs about which hypotheses are null, modeled by non-uniform prior weights; (b) differing importances of hypotheses, modeled by differing penalties for false discoveries; (c) partitions of the hypotheses into known groups, indicating (dis)similarity of hypotheses; and (d) knowledge of independence, positive dependence or arbitrary dependence between hypotheses or groups, allowing for more aggressive or conservative procedures. We present a general framework for global null testing and false discovery rate (FDR) control that allows the scientist to incorporate all four types of prior knowledge (a)-(d) simultaneously. Read More

We introduce in this paper a new generalization of the flexible Weibull distribution with four parameters. This model based on the Beta generalized (BG) distribution, Eugene et al. \cite{Eugeneetal2002}, they first using the BG distribution for generating new generalizations. Read More

We consider the nonparametric estimation of the intensity function of a Poisson point process in a circular model from indirect observations $N_1,\ldots,N_n$. These observations emerge from hidden point process realizations with the target intensity through contamination with additive error. Under the assumption that the error distribution is unknown and only available by means of an additional sample $Y_1,\ldots,Y_m$ we derive minimax rates of convergence with respect to the sample sizes $n$ and $m$ under abstract smoothness conditions and propose an orthonormal series estimator which attains the optimal rate of convergence. Read More

**Affiliations:**

^{1}CREST, MODAL'X,

^{2}INRA

Consider the twin problems of estimating the connection probability matrix of an inhomogeneous random graph and the graphon of a W-random graph. We establish the minimax estimation rates with respect to the cut metric for classes of block constant matrices and step function graphons. Surprisingly, our results imply that, from the minimax point of view, the raw data, that is, the adjacency matrix of the observed graph, is already optimal and more involved procedures cannot improve the convergence rates for this metric. Read More

In this article we derive the almost sure convergence theory of Bayes factor in the general set-up that includes even dependent data and misspecified models, as a simple application of a result of Shalizi (2009) to a well-known identity satisfied by the Bayes factor. Read More

There has been substantial recent interest in record linkage, attempting to group the records pertaining to the same entities from a large database lacking unique identifiers. This can be viewed as a type of "microclustering," with few observations per cluster and a very large number of clusters. A variety of methods have been proposed, but there is a lack of literature providing theoretical guarantees on performance. Read More

We study the problem of reconstructing the graph of a sparse Gaussian Graphical Model from independent observations, which is equivalent to finding non-zero elements of an inverse covariance matrix. For a model of size $p$ and maximum degree $d$, information theoretic lower bounds established in prior works require that the number of samples needed for recovering the graph perfectly is at least $d \log p/\kappa^2$, where $\kappa$ is the minimum normalized non-zero entry of the inverse covariance matrix. Existing algorithms require additional assumptions to guarantee perfect graph reconstruction, and consequently, their sample complexity is dependent on parameters that are not present in the information theoretic lower bound. Read More

Multi-parameter one-sided hypothesis test problems arise naturally in many applications. We are particularly interested in effective tests for monitoring multiple quality indices in forestry products. Our search reveals that there are many effective statistical methods in the literature for normal data, and that they can easily be adapted for non-normal data. Read More

This paper develops the theoretical framework and the equations of a new robust Generalized Maximum-likelihood-type Unscented Kalman Filter (GM-UKF) that is able to suppress observation and innovation outliers while filtering out non-Gaussian measurement noise. Because the errors of the real and reactive power measurements calculated using Phasor Measurement Units (PMUs) follow long-tailed probability distributions, the conventional UKF provides strongly biased state estimates since it relies on the weighted least squares estimator. By contrast, the state estimates and residuals of our GM-UKF are proved to be roughly Gaussian, allowing the sigma points to reliably approximate the mean and the covariance matrices of the predicted and corrected state vectors. Read More

The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different remedies have been proposed to overcome the curse of dimensionality. Read More

In a given problem, the Bayesian statistical paradigm requires the specification of a prior distribution that quantifies relevant information, about the unknowns of main interest, external to the data. In cases where little such information is available, the problem under study may possess an invariance under a transformation group that encodes a lack of information, leading to a unique prior. Previous successful examples of this idea have included location-scale invariance under linear transformation, multiplicative invariance of the rate at which events in a counting process are observed, and the derivation of the Haldane prior for a Bernoulli success probability. Read More

We propose a method for variable selection in discriminant analysis with mixed categorical and continuous variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. Read More

Stochastic ordering of distributions of random variables may be defined by the relative convexity of the tail functions. This has been extended to higher order stochastic orderings, by iteratively reassigning tail-weights. The actual verification of those stochastic orderings is not simple, as this depends on inverting distribution functions for which there may be no explicit expression. Read More

Classical spectral analysis is based on the discrete Fourier transform of the auto-covariances. In this paper we investigate the asymptotic properties of new frequency domain methods where the auto-covariances in the spectral density are replaced by alternative dependence measures which can be estimated by U-statistics. An interesting example is given by Kendall{'}s $\tau$ , for which the limiting variance exhibits a surprising behavior. Read More

Since its introduction in 2000, the locally linear embedding (LLE) has been widely applied in data science. We provide an asymptotical analysis of the LLE under the manifold setup. We show that for the general manifold, asymptotically we may not obtain the Laplace-Beltrami operator, and the result may depend on the non-uniform sampling, unless a correct regularization is chosen. Read More

We proposed a new penalized method in this paper to solve sparse Poisson Regression problems. Being different from $\ell_1$ penalized log-likelihood estimation, our new method can be viewed as penalized weighted score function method. We show that under mild conditions, our estimator is $\ell_1$ consistent and the tuning parameter can be pre-specified, which shares the same good property of the square-root Lasso. Read More

Probabilistic integration of a continuous dynamical system is a way of systematically introducing model error, at scales no larger than errors inroduced by standard numerical discretisation, in order to enable thorough exploration of possible responses of the system to inputs. It is thus a potentially useful approach in a number of applications such as forward uncertainty quantification, inverse problems, and data assimilation. We extend the convergence analysis of probabilistic integrators for deterministic ordinary differential equations, as proposed by Conrad et al. Read More

In this paper, we consider a probabilistic setting where the probability measures are considered to be random objects. We propose a procedure of construction non-asymptotic confidence sets for empirical barycenters in 2-Wasserstein space and develop the idea further to construction of a non-parametric two-sample test that is then applied to the detection of structural breaks in data with complex geometry. Both procedures mainly rely on the idea of multiplier bootstrap (Spokoiny and Zhilova (2015), Chernozhukov et al. Read More

We consider estimation of certain functionals of random graphs. The random graph is generated by a stochastic block model (SBM). The number of classes is fixed or grows with the number of vertices. Read More

We consider the problem of choosing between parametric models for a discrete observable, taking a Bayesian approach in which the within-model prior distributions are allowed to be improper. In order to avoid the ambiguity in the marginal likelihood function in such a case, we apply a homogeneous scoring rule. For the particular case of distinguishing between Poisson and Negative Binomial models, we conduct simulations that indicate that, applied prequentially, the method will consistently select the true model. Read More

In modern data sets, the number of available variables can greatly exceed the number of observations. In this paper we show how valid confidence intervals can be constructed by approximating the inverse covariance matrix by a scaled Moore-Penrose pseudoinverse, and using the lasso to perform a bias correction. In addition, we propose random least squares, a new regularization technique which yields narrower confidence intervals with the same theoretical validity. Read More

For the particles undergoing the anomalous diffusion with different waiting time distributions for different internal states, we derive the Fokker-Planck and Feymann-Kac equations, respectively, describing positions of the particles and functional distributions of the trajectories of particles; in particular, the equations governing the functional distribution of internal states are also obtained. The dynamics of the stochastic processes are analyzed and the applications, calculating the distribution of the first passage time and the distribution of the fraction of the occupation time, of the equations are given. Read More

**Affiliations:**

^{1}SELECT, LMO

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. Read More

We propose statistical inferential procedures for panel data models with interactive fixed effects in a kernel ridge regression framework.Compared with traditional sieve methods, our method is automatic in the sense that it does not require the choice of basis functions and truncation parameters.Model complexity is controlled by a continuous regularization parameter which can be automatically selected by generalized cross validation. Read More

We show that two estimators, the Square-Root Lasso and the Square-Root Slope can achieve the exact optimal minimax prediction rate, which is $(s/n) \log(p/s)$ in the setting of the sparse high-dimensional linear regression. Here, $n$ is the sample size, $p$ is the dimension and $s$ is the sparsity parameter. We also prove optimality for the estimation error in the $l_q$-norm, with $q \in [1,2]$ for the Square-Root Lasso, and in the $l_2$ and sorted $l_1$ norms for the Square-Root Slope. Read More

**Affiliations:**

^{1}MAP5,

^{2}SAMM,

^{3}MAP5

We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. Read More

This paper studies a \textit{partial functional partially linear single-index model} that consists of a functional linear component as well as a linear single-index component. This model generalizes many well-known existing models and is suitable for more complicated data structures. However, its estimation inherits the difficulties and complexities from both components and makes it a challenging problem, which calls for new methodology. Read More

Tensors, or high-order arrays, attract much attention in recent research. In this paper, we propose a general framework for tensor principal component analysis (tensor PCA), which focuses on the methodology and theory for extracting the hidden low-rank structure from the high-dimensional tensor data. A unified solution is provided for tensor PCA with considerations in both statistical limits and computational costs. Read More

This paper examines the limit properties of information criteria (such as AIC, BIC, HQIC) for distinguishing between the unit root model and the various kinds of explosive models. The explosive models include the local-to-unit-root model, the mildly explosive model and the regular explosive model. Initial conditions with different order of magnitude are considered. Read More

Record linkage involves merging records in large, noisy databases to remove duplicate entities. It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods directly linking records to one another are computationally infeasible as the number of records grows. Read More

We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive streams of graph edges. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimation variance of counts of specified sets of subgraphs. Read More

This paper deals with feature selection procedures for spatial point processes intensity estimation. We consider regularized versions of estimating equations based on Campbell theorem derived from two classical functions: Poisson likelihood and logistic regression likelihood. We provide general conditions on the spatial point processes and on penalty functions which ensure consistency, sparsity and asymptotic normality. Read More

We study asymptotic properties of conditional least squares estimators for the drift parameters of two-factor affine diffusions based on continuous time observations. We distinguish three cases: subcritical, critical and supercritical. For all the drift parameters, in the subcritical and supercritical cases, asymptotic normality and asymptotic mixed normality is proved, while in the critical case, non-standard asymptotic behavior is described. Read More

**Affiliations:**

^{1}IMT, LaMME,

^{2}IMT, LaMME,

^{3}LPMA

We introduce a general methodology for post hoc inference in a large-scale multiple testing framework. The approach is called " user-agnostic " in the sense that the statistical guarantee on the number of correct rejections holds for any set of candidate items selected by the user (after having seen the data). This task is investigated by defining a suitable criterion, named the joint-family-wise-error rate (JER for short). Read More

We study the maximum likelihood degree (ML degree) of toric varieties, known as discrete exponential models in statistics. By introducing scaling coefficients to the monomial parameterization of the toric variety, one can change the ML degree. We show that the ML degree is equal to the degree of the toric variety for generic scalings, while it drops if and only if the scaling vector is in the locus of the principal $A$-determinant. Read More

We investigate the problem of testing the equivalence between two discrete histograms. A {\em $k$-histogram} over $[n]$ is a probability distribution that is piecewise constant over some set of $k$ intervals over $[n]$. Histograms have been extensively studied in computer science and statistics. Read More

**Affiliations:**

^{1}LM-Orsay,

^{2}IMT,

^{3}LAAS-MAC, CTU,

^{4}LAAS-MAC,

^{5}LAAS-MAC, IMT

We present a new approach to the design of D-optimal experiments with multivariate polynomial regressions on compact semi-algebraic design spaces. We apply the moment-sum-of-squares hierarchy of semidefinite programming problems to solve numerically and approximately the optimal design problem. The geometry of the design is recovered with semidefinite programming duality theory and the Christoffel polynomial. Read More

**Authors:**Jon A. Wellner

Lederer and van de Geer (2013) introduced a new Orlicz norm, the Bernstein-Orlicz norm, which is connected to Bernstein type inequalities. Here we introduce another Orlicz norm, the Bennett-Orlicz norm, which is connected to Bennett type inequalities. The new Bennett-Orlicz norm yields inequalities for expectations of maxima which are potentially somewhat tighter than those resulting from the Bernstein-Orlicz norm when they are both applicable. Read More