# G. Nuel

## Contact Details

NameG. Nuel |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesStatistics - Applications (10) Mathematics - Probability (4) Statistics - Methodology (3) Statistics - Computation (3) Mathematics - Information Theory (2) Computer Science - Information Theory (2) Quantitative Biology - Quantitative Methods (1) Computer Science - Symbolic Computation (1) Computer Science - Learning (1) Quantitative Biology - Genomics (1) |

## Publications Authored By G. Nuel

Probabilistic Component Latent Analysis (PLCA) is a statistical modeling method for feature extraction from non-negative data. It has been fruitfully applied to various research fields of information retrieval. However, the EM-solved optimization problem coming with the parameter estimation of PLCA-based models has never been properly posed and justified. Read More

**Affiliations:**

^{1}MAP5,

^{2}LPMA

**Category:**Statistics - Methodology

In a survival analysis context we suggest a new method to estimate the piecewise constant hazard rate model. The method provides an automatic procedure to find the number and location of cut points and to estimate the hazard on each cut interval. Estimation is performed through a penalized likelihood using an adaptive ridge procedure. Read More

Mendelian diseases are determined by a single mutation in a given gene. However, in the case of diseases with late onset, the age at onset is variable; it can even be the case that the onset is not observed in a lifetime. Estimating the survival function of the mutation carriers and the effect of modifying factors such as the sex, mutation, origin, etc, is a task of importance, both for management of mutation carriers and for prevention. Read More

**Affiliations:**

^{1}MAP5,

^{2}LPMA

**Category:**Statistics - Applications

In this article we suggest a new statistical approach considering survival heterogeneity as a breakpoint model in an ordered sequence of time to event variables. The survival responses need to be ordered according to a numerical covariate. Our esti- mation method will aim at detecting heterogeneity that could arise through the or- dering covariate. Read More

Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L0 penalties. An elegant solution to this problem is provided by the multi-step adaptive lasso, where iteratively weighted lasso problems are solved, whose weights are updated in such a way that the procedure converges towards selection with L0 penalties. Read More

It is generally acknowledged that most complex diseases are affected in part by interactions between genes and genes and/or between genes and environmental factors. Taking into account environmental exposures and their interactions with genetic factors in genome-wide association studies (GWAS) can help to identify high-risk subgroups in the population and provide a better understanding of the disease. For this reason, many methods have been developed to detect gene-environment (G*E) interactions. Read More

Background: Inference of gene regulatory networks from transcriptomic data has been a wide research area in recent years. Proposed methods are mainly based on the use of graphical Gaussian models for observational wild-type data and provide undirected graphs that are not able to accurately highlight the causal relationships among genes. In the present work, we seek to improve estimation of causal effects among genes by jointly modeling observational transcriptomic data with intervention data obtained by performing knock-outs or knock-downs on a subset of genes. Read More

Methodological development for the inference of gene regulatory networks from transcriptomic data is an active and important research area. Several approaches have been proposed to infer relationships among genes from observational steady-state expression data alone, mainly based on the use of graphical Gaussian models. However, these methods rely on the estimation of partial correlations and are only able to provide undirected graphs that cannot highlight causal relationships among genes. Read More

Background. With the increasing interest in post-GWAS research which represents a transition from genome-wide association discovery to analysis of functional mechanisms, attention has been lately focused on the potential of including various biological material in epidemiological studies. In particular, exploration of the carcinogenic process through transcriptional analysis at the epidemiological level opens up new horizons in functional analysis and causal inference, and requires a new design together with adequate analysis procedures. Read More

The detection of change-points in heterogeneous sequences is a statistical challenge with many applications in fields such as finance, signal analysis and biology. A wide variety of literature exists for finding an ideal set of change-points for characterizing the data. In this tutorial we elaborate on the Hidden Markov Model (HMM) and present two different frameworks for applying HMM to change-point models. Read More

In this paper, we consider the Integrated Completed Likelihood (ICL) as a useful criterion for estimating the number of changes in the underlying distribution of data in problems where detecting the precise location of these changes is the main goal. The exact computation of the ICL requires O(Kn2) operations (with K the number of segments and n the number of data-points) which is prohibitive in many practical situations with large sequences of data. We describe a framework to estimate the ICL with O(Kn) complexity. Read More

We measure the influence of individual observations on the sequence of the hidden states of the Hidden Markov Model (HMM) by means of the Kullback-Leibler distance (KLD). Namely, we consider the KLD between the conditional distribution of the hidden states' chain given the complete sequence of observations and the conditional distribution of the hidden chain given all the observations but the one under consideration. We introduce a linear complexity algorithm for computing the influence of all the observations. Read More

The detection of change-points in heterogeneous sequences is a statistical challenge with applications across a wide variety of fields. In bioinformatics, a vast amount of methodology exists to identify an ideal set of change-points for detecting Copy Number Variation (CNV). While considerable efficient algorithms are currently available for finding the best segmentation of the data in CNV, relatively few approaches consider the important problem of assessing the uncertainty of the change-point location. Read More

Assessing the statistical power to detect susceptibility variants plays a critical role in GWA studies both from the prospective and retrospective points of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the "gold" standard consists in simulating genotypes given the phenotypes (e. Read More

In Bayesian networks, exact belief propagation is achieved through message passing algorithms. These algorithms (ex: inward and outward) provide only a recursive definition of the corresponding messages. In contrast, when working on hidden Markov models and variants, one classically first defines explicitly these messages (forward and backward quantities), and then derive all results and algorithms. Read More

We suggest new recursive formulas to compute the exact value of the Kullback-Leibler distance (KLD) between two general Hidden Markov Trees (HMTs). For homogeneous HMTs with regular topology, such as homogeneous Hidden Markov Models (HMMs), we obtain a closed-form expression for the KLD when no evidence is given. We generalize our recursive formulas to the case of HMMs conditioned on the observable variables. Read More

**Affiliations:**

^{1}MAP5,

^{2}LJK

We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. Read More

**Affiliations:**

^{1}MAP5

**Category:**Mathematics - Probability

In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations. Read More