# Physics - Data Analysis; Statistics and Probability Publications (50)

## Search

## Physics - Data Analysis; Statistics and Probability Publications

In real-world applications, observations are often constrained to a small fraction of a system. Such spatial subsampling can be caused by the inaccessibility or the sheer size of the system, and cannot be overcome by longer sampling. Spatial subsampling can strongly bias inferences about a system's aggregated properties. Read More

Small-angle neutron scattering (SANS) is an experimental technique to detect material structures in the nanometer to micrometer range. The solution of the structural model constructed from SANS strongly depends on the accuracy of the reduced data. The time-of-flight (TOF) SANS data are dependent on the wavelength of the pulsed neutron source. Read More

We present the results from the first measurements of the Time-Correlated Pulse-Height (TCPH) distributions from 4.5 kg sphere of $\alpha$-phase weapons-grade plutonium metal in five configurations: bare, reflected by 1.27 cm and 2. Read More

In time domain astronomy, recurrent transients present a special problem: how to infer total populations from limited observations. Monitoring observations may give a biassed view of the underlying population due to limitations on observing time, visibility and instrumental sensitivity. A similar problem exists in the life sciences, where animal populations (such as migratory birds) or disease prevalence, must be estimated from sparse and incomplete data. Read More

The problem of information fusion from multiple data-sets acquired by multimodal sensors has drawn significant research attention over the years. In this paper, we focus on a particular problem setting consisting of a physical phenomenon or a system of interest observed by multiple sensors. We assume that all sensors measure some aspects of the system of interest with additional sensor-specific and irrelevant components. Read More

We consider two methods of fitting parameters to datasets with sys- tematic errors. Such a procedure is needed to combine results from different experiments where the elements within each dataset share a common systematic uncertainty. We show the methods are equivalent. Read More

The transverse momentum ($p_T$) spectra from heavy-ion collisions at intermediate momenta are described by non-extensive statistical models. Assuming a fixed relative variance of the temperature fluctuating event by event or alternatively a fixed mean multiplicity in a negative binomial distribution (NBD), two different linear relations emerge between the temperature, $T$, and the Tsallis parameter $q-1$. Our results qualitatively agree with that of G. Read More

In this report, we applied expectation and maximization (EM) method described by Philips et al [1] to recover two-dimensional (2D) structure from multiple sparse signal images in random orientation. The detailed derivation of EM algorithm for 2D image reconstruction was evaluated. Data sets with average 40 photons per frame were successfully classified by orientation. Read More

We develop an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system. The innovation circumvents a disadvantage of standard model selection which typically limits the number candidate models considered due to the intractability of computing information criteria. Using a recently developed sparse identification of nonlinear dynamics algorithm, the sub-selection of candidate models near the Pareto frontier allows for a tractable computation of AIC (Akaike information criteria) or BIC (Bayes information criteria) scores for the remaining candidate models. Read More

Many state of the art methods for the thermodynamic and kinetic characterization of large and complex biomolecular systems by simulation rely on ensemble approaches, where data from large numbers of relatively short trajectories are integrated. In this context, Markov state models (MSMs) are extremely popular because they can be used to compute stationary quantities and long-time kinetics from ensembles of short simulations, provided that these short simulations are in "local equilibrium" within the MSM states. However, in the last over 15 years since the inception of MSMs, it has been controversially discussed and not yet been answered how deviations from local equilibrium can be detected, whether these deviations induce a practical bias in MSM estimation, and how to correct for them. Read More

The images obtained from cryo electron microscopy are the projection of many heterogeneous instances of the object under study (e.g., a virus). Read More

Numerical and experimental turbulence simulations are nowadays reaching the size of the so-called big data, thus requiring refined investigative tools for appropriate statistical analyses and data mining. We present a new approach based on the complex network theory, offering a powerful framework to explore complex systems with a huge number of interacting elements. Although interest on complex networks has been increasing in the last years, few recent studies have been applied to turbulence. Read More

First-passage time problems are ubiquitous across many fields of study including transport processes in semiconductors and biological synapses, evolutionary game theory and percolation. Despite their prominence, first-passage time calculations have proven to be particularly challenging. Analytical results to date have often been obtained under strong conditions, leaving most of the exploration of first-passage time problems to direct numerical computations. Read More

We present an approach for real-time change detection in the transient phases of complex dynamical systems based on tracking the local phase and amplitude synchronization among the components of a univariate time series signal derived via Intrinsic Time scale Decomposition (ITD)--a nonlinear, non-parametric analysis method. We investigate the properties of ITD components and show that the expected level of phase synchronization at a given change point may be enhanced by more than 4 folds when we employ multiple intrinsic components. Next, we introduce a concept of maximal mutual agreement to identify the set of ITD components that are most likely to capture the information about dynamical changes of interest, and define an InSync statistic to capture this local information. Read More

An algorithm is described that can generate random variants of a time series or image while preserving the probability distribution of original values and the pointwise Holder regularity. Thus, it preserves the multifractal properties of the data. Our algorithm is similar in principle to well-known algorithms based on the preservation of the Fourier amplitude spectrum and original values of a time series. Read More

Modern technology for producing extremely bright and coherent X-ray laser pulses provides the possibility to acquire a large number of diffraction patterns from individual biological nanoparticles, including proteins, viruses, and DNA. These two-dimensional diffraction patterns can be practically reconstructed and retrieved down to a resolution of a few \angstrom. In principle, a sufficiently large collection of diffraction patterns will contain the required information for a full three-dimensional reconstruction of the biomolecule. Read More

Evaluating theories in physics used to be easy. Our theories provided very distinct predictions. Experimental accuracy was so small that worrying about epistemological problems was not necessary. Read More

Characterizing and controlling nonlinear, multi-scale phenomena play important roles in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. Read More

Context. The dynamics of the flaring loops in active region (AR) 11429 are studied. The observed dynamics consist of several evolution stages of the flaring loop system during both the ascending and descending phases of the registered M-class flare. Read More

Williams and Beer (2010) proposed a nonnegative mutual information decomposition, based on the construction of information gain lattices, which allows separating the information that a set of variables contains about another into components interpretable as the unique information of one variable, or redundant and synergy components. In this work we extend the framework of Williams and Beer (2010) focusing on the lattices that underpin the decomposition. We generalize the type of constructible lattices and examine the relations between the terms in different lattices, for example relating bivariate and trivariate decompositions. Read More

100 years after Smoluchowski introduces his approach to stochastic processes, they are now at the basis of mathematical and physical modeling in cellular biology: they are used for example to analyse and to extract features from large number (tens of thousands) of single molecular trajectories or to study the diffusive motion of molecules, proteins or receptors. Stochastic modeling is a new step in large data analysis that serves extracting cell biology concepts. We review here the Smoluchowski's approach to stochastic processes and provide several applications for coarse-graining diffusion, studying polymer models for understanding nuclear organization and finally, we discuss the stochastic jump dynamics of telomeres across cell division and stochastic gene regulation. Read More

Nowadays, modern electron microscopes deliver images at atomic scale. The precise atomic structure encodes information about material properties. Thus, an important ingredient in the image analysis is to locate the centers of the atoms shown in micrographs as precisely as possible. Read More

This paper will design non-linear frequency modulation (NLFM) signal for Chebyshev, Kaiser, Taylor, and raised-cosine power spectral densities (PSDs). Then, the variation of peak sidelobe level with regard to mainlobe width for these four different window functions are analyzed. It has been demonstrated that reduction of sidelobe level in NLFM signal can lead to increase in mainlobe width of autocorrelation function. Read More

We developed a minimum gradient based method to track ridge features in 2D image plot, which is a typical data representation in many momentum resolved spectroscopy experiments. Through both analytic formulation and numerical simulation, we compare this new method with existing DC (distribution curve) based and higher order derivative based analyses. We find that the new method has good noise resilience and enhanced contrast especially for weak intensity features, meanwhile preserves the quantitative local maxima information from the raw image. Read More

A symmetry-guided time redefinition may enhance and simplify analyses of historical series displaying recurrent patterns. Enforcing a simple-scaling symmetry with Hurst exponent 1/2 and the requirement of increments' stationarity, we identify a time-definition protocol in the financial case. The novel time scale, constructed through a systematic application of the Kolmogorov-Smirnov criterion to extensive data of the S&P500 index, lays a bridge between the regime of minutes and that of several days in physical time. Read More

This paper contributes to the field of modeling and hindcasting of the total solar irradiance (TSI) based on different proxy data that extend further back in time than the TSI that is measured from satellites. We introduce a simple method to analyze persistent frequency-dependent correlations (FDCs) between the time series and use these correlations to hindcast missing historical TSI values. We try to avoid arbitrary choices of the free parameters of the model by computing them using an optimization procedure. Read More

The neural network-based approach, presented in this paper, was developed for the analysis of peak profiles and for the prediction of base profile characteristics, such as width, asymmetry, asymptotic ("peak tales"), etc. of the observed distributions. The obtained parameters can be used as the initial parameters in the peak decomposition applications. Read More

Machine learning (ML) algorithms have been employed in the problem of classifying signal and background events with high accuracy in particle physics. In this paper, we use a widespread ML technique, namely, \emph{stacked generalization}, for the task of discovering a new neutral Higgs boson in gluon fusion. We found that, at the same time it demands far less computation efforts, \emph{stacking} ML algorithms performs almost as well as deep neural networks (DNN) trained exclusively with kinematic distributions for the same task by building either a highly discriminating linear model or a shallower neural network with stacked ML outputs. Read More

We propose a nonparametric approach for probabilistic prediction of the AL index trained with AL and solar wind ($v B_z$) data. Our framework relies on the diffusion forecasting technique, which views AL and $ v B_z $ data as observables of an autonomous, ergodic, stochastic dynamical system operating on a manifold. Diffusion forecasting builds a data-driven representation of the Markov semigroup governing the evolution of probability measures of the dynamical system. Read More

We present a weighted estimator of the covariance and correlation in bipartite complex systems with a double layer of heterogeneity. The advantage provided by the weighted estimators lies in the fact that the unweighted sample covariance and correlation can be shown to possess a bias. Indeed, such a bias affects real bipartite systems, and, for example, we report its effects on two empirical systems, one social and the other biological. Read More

Following a paper in which the fundamental aspects of probabilistic inference were introduced by means of a toy experiment, details of the analysis of simulated long sequences of extractions are shown here. In fact, the striking performance of probability-based inference and forecasting, compared to those obtained by simple `rules', might impress those practitioners who are usually underwhelmed by the philosophical foundation of the different methods. The analysis of the sequences also shows how the smallness of the probability of what has been actually observed, given the hypotheses of interest, is irrelevant for the purpose of inference. Read More

Distribution network operators (DNOs) are increasingly concerned about the impact of low carbon technologies on the low voltage (LV) networks. More advanced metering infrastructures provide numerous opportunities for more accurate power flow analysis of the LV networks. However, such data may not be readily available for DNOs and in any case is likely to be expensive. Read More

Multivariate goodness-of-fit and two-sample tests are important components of many nuclear and particle physics analyses. While a variety of powerful methods are available if the dimensionality of the feature space is small, such tests rapidly lose power as the dimensionality increases and the data inevitably become sparse. Machine learning classifiers are powerful tools capable of reducing highly multivariate problems into univariate ones, on which commonly used tests such as $\chi^2$ or Kolmogorov-Smirnov may be applied. Read More

Starting from three-dimensional volume data of a granular packing, as e.g. obtained by X-ray Computed Tomography, we discuss methods to first detect the individual particles in the sample and then analyze their properties. Read More

Bayesian nonparametric methods have recently transformed emerging areas within data science. One such promising method, the infinite hidden Markov model (iHMM), generalizes the HMM which itself has become a workhorse in single molecule data analysis. The iHMM goes beyond the HMM by learning the number of states in addition to all other parameters learned by the HMM. Read More

The hidden Markov model (HMM) has been a workhouse of single molecule data analysis and is now commonly used as a standalone tool in time series analysis or in conjunction with other analyses methods such as tracking. Here we provide a conceptual introduction to an important generalization of the HMM which is poised to have a deep impact across Biophysics: infinite hidden Markov model (iHMM). As a modeling tool, iHMMs can analyze sequential data without a priori setting a specific number of states as required for the traditional (finite) HMM. Read More

Fluorescent Nuclear Track Detectors (FNTDs) offer a superior, sub-micrometer spatial resolution that allows for single particle track detection. However, when assessing particle fluence from the measured track positions, discrimination of actual fluence patterns from stochastic fluctuations is necessary due to spatial randomness in particle arrival. This work quantifies the spatial limits of fluence-based dosimetry of (heavy) charged particles and presents the use of tools to detect deviation from homogenous (true) fluence in measured data. Read More

We develop efficient ways to consider and correct for the effects of hidden units for the paradigmatic case of the inverse kinetic Ising model with fully asymmetric couplings. We identify two sources of error in reconstructing the connectivity among the observed units while ignoring part of the network. One leads to a systematic bias in the inferred parameters, whereas the other involves correlations between the visible and hidden populations and has a magnitude that depends on the coupling strength. Read More

Across a far-reaching diversity of scientific and industrial applications, a general key problem involves relating the structure of time-series data to a meaningful outcome, such as detecting anomalous events from sensor recordings, or diagnosing patients from physiological time-series measurements like heart rate or brain activity. Currently, researchers must devote considerable effort manually devising, or searching for, properties of their time series that are suitable for the particular analysis problem at hand. Addressing this non-systematic and time-consuming procedure, here we introduce a new tool, hctsa, that selects interpretable and useful properties of time series automatically, by comparing implementations over 7700 time-series features drawn from diverse scientific literatures. Read More

Information geometry can be used to understand and optimize Higgs measurements at the LHC. The Fisher information encodes the maximum sensitivity of observables to model parameters for a given experiment. Applied to higher-dimensional operators, it defines the new physics reach of any LHC signature. Read More

The connection between domain relaxations at individual scales and the collective heterogeneous response in non-equilibrium systems is a topic of profound interest in recent times. In a model sys- tem of constantly driven oppositely charged binary colloidal suspension, we probe such relaxations as the elongated lanes of likely charges interact with increasing field in the orthogonal plane using Brownian Dynamics simulations. We show that the system undergoes a structural and dynamical cross-over: from an initial fast-relaxing homogeneous phase to a heterogeneous lane phase following a slow relaxation via an intermediate phase with mixed relaxation. Read More

Reconstruction of structure and parameters of a graphical model from binary samples is a problem of practical importance in a variety of disciplines, ranging from statistical physics and computational biology to image processing and machine learning. The focus of the research community shifted towards developing universal reconstruction algorithms which are both computationally efficient and require the minimal amount of expensive data. We introduce a new method, Interaction Screening, which accurately estimates the model parameters using local optimization problems. Read More

We address the problem of semi-supervised learning in relational networks, networks in which nodes are entities and links are the relationships or interactions between them. Typically this problem is confounded with the problem of graph-based semi-supervised learning (GSSL), because both problems represent the data as a graph and predict the missing class labels of nodes. However, not all graphs are created equally. Read More

Optical entropy sources show great promise for the task of random number generation, but commonly used evaluation practices are ill-suited to quantify their performance. In this Commentary we propose a new approach to quantifying entropy generation which provides greater insights and understanding of the optical sources of randomness. Read More

The process of doing Science in condition of uncertainty is illustrated with a toy experiment in which the inferential and the forecasting aspects are both present. The fundamental aspects of probabilistic reasoning, also relevant in real life applications, arise quite naturally and the resulting discussion among non-ideologized, free-minded people offers an opportunity for clarifications. Read More

**Authors:**Luca M. Ghiringhelli

^{1}, Jan Vybiral

^{2}, Emre Ahmetcik

^{3}, Runhai Ouyang

^{4}, Sergey V. Levchenko

^{5}, Claudia Draxl

^{6}, Matthias Scheffler

^{7}

**Affiliations:**

^{1}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany,

^{2}Charles University, Department of Mathematical Analysis, Prague, Czech Republic,

^{3}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany,

^{4}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany,

^{5}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany,

^{6}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany,

^{7}Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin-Dahlem, Germany

The availability of big data in materials science offers new routes for analyzing materials properties and functions and achieving scientific understanding. Finding structure in these data that is not directly visible by standard tools and exploitation of the scientific information requires new and dedicated methodology based on approaches from statistical learning, compressed sensing, and other recent methods from applied mathematics, computer science, statistics, signal processing, and information science. In this paper, we explain and demonstrate a compressed-sensing based methodology for feature selection, specifically for discovering physical descriptors, i. Read More

Muography techniques applied to geological structures greatly improved in the past ten years. Recent applications demonstrate the interest of the method not only to perform structural imaging but also to monitor the dynamics of inner movements like magma ascent inside volcanoes or density variations in hydrothermal systems. Muography time-resolution has been studied thanks to dedicated experiments, e. Read More

A prototype model of a stochastic one-variable system with a linear restoring force driven by two cross-correlated multiplicative and additive Gaussian white noises was considered earlier [S. I. Denisov et al. Read More

We study generalised restricted Boltzmann machines with generic priors for units and weights, interpolating between Boolean and Gaussian variables. We present a complete analysis of the replica symmetric phase diagram of these models, which can be regarded as generalised Hopfield models. We show the way the paramagnetic phase boundary is directly related to the optimal size of the training set necessary for good generalisation in a teacher- student scenario. Read More

Barab\'asi-Albert's `Scale Free' model is the accepted theory of the evolution of real world networks. Careful comparison of the theory with a wide range of real world graphs, however, has identified shortcomings in the predictions of the theory when compared to the data. In particular, the exponent $\gamma$ of the power law distribution of degree is predicted by the model to be identically 3, whereas the data has values of $\gamma$ between 1. Read More