Physics - Data Analysis; Statistics and Probability Publications (50)


Physics - Data Analysis; Statistics and Probability Publications

The theoretical description of non-renewal stochastic systems is a challenge. Analytical results are often not available or can only be obtained under strong conditions, limiting their applicability. Also, numerical results have mostly been obtained by ad-hoc Monte--Carlo simulations, which are usually computationally expensive when a high degree of accuracy is needed. Read More

Relying on multifractal behavior of pulsar timing residuals ({\it PTR}s), we examine the capability of Multifractal Detrended Fluctuation Analysis (MF-DFA) and Multifractal Detrending Moving Average Analysis (MF-DMA) modified by Singular Value Decomposition (SVD) and Adaptive Detrending (AD), to detect footprint of gravitational waves (GWs) superimposed on {\it PTR}s. Mentioned methods enable us to clarify the type of GWs which is related to the value of Hurst exponent. We introduce three strategies based on generalized Hurst exponent and width of singularity spectrum, to determine the dimensionless amplitude of GWs. Read More

Plants emission of volatile organic compounds (VOCs) is involved in a wide class of ecological functions, as VOCs play a crucial role in plants interactions with biotic and abiotic factors. Accordingly, they vary widely across species and underpin differences in ecological strategy. In this paper, VOCs spontaneously emitted by 109 plant species (belonging to 56 different families) have been qualitatively and quantitatively analysed in order to classify plants species. Read More

Random sampling via a quantile function Q(u) is a popular technique, but two very common sources of numerical instability are often overlooked: (i) quantile functions tend to be ill-conditioned when u=>1 and (ii) feeding them uniformly spaced u can make them ill-conditioned as u=>0. These flaws undermine the tails of Q(u)'s distribution, and both flaws are present in the polar method for normal sampling (used by GNU's std::normal_distribution and numpy.random. Read More

Recurrence networks and the associated statistical measures have become important tools in the analysis of time series data. In this work, we test how effective the recurrence network measures are in analyzing real world data involving two main types of noise, white noise and colored noise. We use two prominent network measures as discriminating statistic for hypothesis testing using surrogate data for a specific null hypothesis that the data is derived from a linear stochastic process. Read More

The analysis of observed time series from nonlinear systems is usually done by making a time-delay reconstruction to unfold the dynamics on a multi-dimensional state space. An important aspect of the analysis is the choice of the correct embedding dimension. The conventional procedure used for this is either the method of false nearest neighbors or the saturation of some invariant measure, such as, correlation dimension. Read More

Objective: The recent emergence and success of electroencephalography (EEG) in low-cost portable devices, has opened the door to a new generation of applications processing a small number of EEG channels for health monitoring and brain-computer interfacing. These recordings are, however, contaminated by many sources of noise degrading the signals of interest, thus compromising the interpretation of the underlying brain state. In this work, we propose a new data-driven algorithm to effectively remove ocular and muscular artifacts from single-channel EEG: the surrogate-based artifact removal (SuBAR). Read More

Complex networks are usually characterized in terms of their topological, spatial, or information-theoretic properties and combinations of the associated metrics are used to discriminate networks into different classes or categories. However, even with the present variety of characteristics at hand it still remains a subject of current research to appropriately quantify a network's complexity and correspondingly discriminate between different types of complex networks, like infrastructure or social networks, on such a basis. Here, we explore the possibility to classify complex networks by means of a statistical complexity measure that has formerly been successfully applied to distinguish different types of chaotic and stochastic time series. Read More

The angle of rotation of any target about the radar line of sight (LOS) is known as the polarization orientation angle. The orientation angle is found to be non-zero for undulating terrains and man-made targets oriented away from the radar LOS. This effect is more pronounced at lower frequencies (eg. Read More

Visibility algorithms are a family of geometric and ordering criteria by which a real-valued time series of N data is mapped into a graph of N nodes. This graph has been shown to often inherit in its topology non-trivial properties of the series structure, and can thus be seen as a combinatorial representation of a dynamical system. Here we explore in some detail the relation between visibility graphs and symbolic dynamics. Read More

Stochastic dynamical systems with continuous symmetries arise commonly in nature and often give rise to coherent spatio-temporal patterns. However, because of their random locations, these patterns are not well captured by current order reduction techniques and a large number of modes is typically necessary for an accurate solution. In this work, we introduce a new methodology for efficient order reduction of such systems by combining (i) the method of slices, a symmetry reduction tool, with (ii) any standard order reduction technique, resulting in efficient mixed symmetry-dimensionality reduction schemes. Read More

In this letter, a methodology is proposed to improve the scattering powers obtained from model-based decomposition using Polarimetric Synthetic Aperture Radar (PolSAR) data. The novelty of this approach lies in utilizing the intrinsic information in the off-diagonal elements of the 3$\times$3 coherency matrix $\mathbf{T}$ represented in the form of complex correlation coefficients. Two complex correlation coefficients are computed between co-polarization and cross-polarization components of the Pauli scattering vector. Read More

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two ubiquitous growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are standard statistical packages for estimating the structural properties of complex networks, there is no corresponding package when it comes to the estimation of growth mechanisms. Read More

Finding a fluorescent target in a biological environment is a common and pressing microscopy problem. This task is formally analogous to the canonical search problem. In ideal (noise-free, truthful) search problems, the well-known binary search is optimal. Read More

We report an investigation of data analysis methods derived from other disciplines, which we applied to physics software systems. They concern the analysis of inequality, trend analysis and the analysis of diversity. The analysis of inequality exploits statistical methods originating from econometrics; trend analysis is typical of economics and environmental sciences; the analysis of diversity is based on concepts derived from ecology and treats software as an ecosystem. Read More

We present a method to reconstruct the complete statistical mode structure and optical losses of multimode conjugated optical fields using an experimentally measured joint photon-number probability distribution. We demonstrate that this method evaluates classical and non-classical properties using a single measurement technique and is well-suited for quantum mesoscopic state characterization. We obtain a nearly-perfect reconstruction of a field comprised of up to 10 modes based on a minimal set of assumptions. Read More

We present a toolbox of new techniques and concepts for the efficient forecasting of experimental sensitivities. These are applicable to a large range of scenarios in (astro-)particle physics, and based on the Fisher information formalism. Fisher information provides an answer to the question what is the maximum extractable information from a given observation?. Read More

We search for the signature of universal properties of extreme events, theoretically predicted for Axiom A flows, in a chaotic and high dimensional dynamical system by studying the convergence of GEV (Generalized Extreme Value) and GP (Generalized Pareto) shape parameter estimates to a theoretical value, expressed in terms of partial dimensions of the attractor, which are global properties. We consider a two layer quasi-geostrophic (QG) atmospheric model using two forcing levels, and analyse extremes of different types of physical observables (local, zonally-averaged energy, and the average value of energy over the mid-latitudes). Regarding the predicted universality, we find closer agreement in the shape parameter estimates only in the case of strong forcing, producing a highly chaotic behaviour, for some observables (the local energy at every latitude). Read More

Starting from the end of the past century, the importance has been recognized of the effect of isotopic composition on some of the temperature fixed points for the most accurate realizations of the ITS-90. In the original definition of the latter, dating back to 1990, only a generic reference was made to natural composition of the substances used for the realization of the fixed points, except for helium. The definition of a reference isotopic composition for three fixed points, e-H2, Ne and H2O, while eliminating the non-uniqueness of the Scale in this respect, induced detectable differences in the present and future realizations of the Scale, at the highest accuracy level, with respect to the previous realizations, when they affected the results of past key comparisons, namely the K1 and K1. Read More

The records statistics in stationary and non-stationary fractal time series is studied extensively. By calculating various concepts in record dynamics, we find some interesting results. In stationary fractional Gaussian noises, we observe a universal behavior for the whole range of Hurst exponents. Read More

The application of Stochastic Differential Equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a non-parametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. Read More

We introduce dynamic nested sampling: a generalisation of the nested sampling algorithm in which the number of "live points" varies to allocate samples more efficiently. In empirical tests the new method increases accuracy by up to a factor of ~8 for parameter estimation and ~3 for evidence calculation compared to standard nested sampling with the same number of samples - equivalent to speeding up the computation by factors of ~64 and ~9 respectively. In addition unlike in standard nested sampling more accurate results can be obtained by continuing the calculation for longer. Read More

In this paper we propose an ad-hoc construction of the Likelihood Function, in order to develop a data analysis procedure, to be applied in atomic and nuclear spectral analysis. The classical Likelihood Function was modified taking into account the underlying statistics of the phenomena studied, by the inspection of the residues of the fitting, which should behave with specific statistical properties. This new formulation was analytically developed, but the sought parameter should be evaluated numerically, since it cannot be obtained as a function of each one of the independent variables. Read More

Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. In previous work, a Bayesian model for truncation errors of perturbative expansions was adapted to EFTs. The model yields posterior probability distribution functions (pdfs) for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. Read More

A framework integrating information theory and network science is proposed, giving rise to a potentially new area of network information science. By incorporating and integrating concepts such as complexity, coding, topological projections and network dynamics, the proposed network-based framework paves the way not only to extending traditional information science, but also to modeling, characterizing and analyzing a broad class of real-world problems, from language communication to DNA coding. Basically, an original network is supposed to be transmitted, with our without compaction, through a time-series obtained by sampling its topology by some network dynamics, such as random walks. Read More

The state of a stochastic process evolving over a time $t$ is typically assumed to lie on a normal distribution whose width scales like $t^{1/2}$. However, processes where the probability distribution is not normal and the scaling exponent differs from $\frac{1}{2}$ are known. The search for possible origins of such "anomalous" scaling and approaches to quantify them are the motivations for the work reported here. Read More

We report on the parallel analysis of the periodic behaviour of coronal mass ejections (CMEs) based on 21 years [1996-2016] of observations with the SOHO/LASCO-C2 coronagraph, solar flares, prominences, and several proxies of solar activity. We consider values of the rates globally and whenever possible, distinguish solar hemispheres and solar cycles 23 and 24. Periodicities are investigated using both frequency (periodogram) and time-frequency (wavelet) analysis. Read More

With sufficient time, double edge-swap Markov chain Monte Carlo (MCMC) methods are able to sample uniformly at random from many different and important graph spaces. For instance, for a fixed degree sequence, MCMC methods can sample any graph from: simple graphs; multigraphs (which may have multiedges); and pseudographs (which may have multiedges and/or multiple self-loops). In this note we extend these MCMC methods to `multiloop-graphs', which allow multiple self-loops but not multiedges and `loopy-multigraphs' which allow multiedges and single self-loops. Read More

We consider the statistical properties of interaction parameter estimates obtained by the direct coupling analysis (DCA) approach to learning interactions from large data sets. Assuming that the data are generated from a random background distribution, we determine the distribution of inferred interactions. Two inference methods are considered: the L2 regularized naive mean-field inference procedure (regularized least squares, RLS), and the pseudo-likelihood maximization (plmDCA). Read More

We present the first algorithm for finding holes in high dimensional data that runs in polynomial time with respect to the number of dimensions. Previous algorithms are exponential. Finding large empty rectangles or boxes in a set of points in 2D and 3D space has been well studied. Read More

Power grid frequency control is a demanding task requiring expensive idle power plants to adapt the supply to the fluctuating demand. An alternative approach is controlling the demand side in such a way that certain appliances modify their operation to adapt to the power availability. This is specially important to achieve a high penetration of renewable energy sources. Read More

The results of the probabilistic analysis of the direct numerical simulations of irregular unidirectional deep-water waves are discussed. It is shown that an occurrence of large-amplitude soliton-like groups represents an extraordinary case, which is able to increase noticeably the probability of high waves even in moderately rough sea conditions. The ensemble of wave realizations should be large enough to take these rare events into account. Read More

Universal characteristics of road networks and traffic patterns can help to forecast and control traffic congestion. The antipersistence of traffic flow time series has been found for many data sets, but its relevance for congestion has been overseen. Based on empirical data from motorways in Germany, we study how antipersistence of traffic flow time-series impacts the duration of traffic congestion on a wide range of time scales. Read More

The growing field of large-scale time domain astronomy requires methods for probabilistic data analysis that are computationally tractable, even with large datasets. Gaussian Processes are a popular class of models used for this purpose but, since the computational cost scales as the cube of the number of data points, their application has been limited to relatively small datasets. In this paper, we present a method for Gaussian Process modeling in one-dimension where the computational requirements scale linearly with the size of the dataset. Read More

An experimental study was carried out to investigate the existence of a critical layer thickness in nanolayer coextrusion, under which no continuous layer is observed. Polymer films containing thousands of layers of alternating polymers with individual layer thicknesses below 100 nm have been prepared by coextrusion through a series of layer multiplying elements. Different films composed of alternating layers of poly(methyl methacrylate) (PMMA) and polystyrene (PS) were fabricated with the aim to reach individual layer thicknesses as small as possible, varying the number of layers, the mass composition of both components and the final total thickness of the film. Read More

The success of the various secondary operations involved in the production of particulate products depends on the production of particles with a desired size and shape from a previous primary operation such as crystallisation. This is because these properties of size and shape affect the behaviour of the particles in the secondary processes. The size and the shape of the particles are very sensitive to the conditions of the crystallisation processes, and so control of these processes is essential. Read More

In a wide range of complex networks, the links between the nodes are temporal and may sporadically appear and disappear. This temporality is fundamental to analyze the formation of paths within such networks. Moreover, the presence of the links between the nodes is a random process induced by nature in many real-world networks. Read More

A sparse modeling approach is proposed for analyzing scanning tunneling microscopy topography data, which contains numerous peaks corresponding to surface atoms. The method, based on the relevance vector machine with $\mathrm{L}_1$ regularization and $k$-means clustering, enables separation of the peaks and atomic center positioning with accuracy beyond the resolution of the measurement grid. The validity and efficiency of the proposed method are demonstrated using synthetic data in comparison to the conventional least-square method. Read More

The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Read More

We present an approach for reconstructing networks of pulse-coupled neuron-like oscillators from passive observation of pulse trains of all nodes. It is assumed that units are described by their phase response curves and that their phases are instantaneously reset by incoming pulses. Using an iterative procedure, we recover the properties of all nodes, namely their phase response curves and natural frequencies, as well as strengths of all directed connections. Read More

It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. Read More

The least-squares support vector machine is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the least-squares support vector machine classifier. The proposed methods are based on randomized block kernel matrices, and we show that they provide good accuracy and reliable scaling for multi-class classification problems with relatively large data sets. Read More

The main goal of the paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate a decomposition procedure for the conditional probability function of sequences considered as the high-order Markov chains. We represent the conditional probability function as the sum of multi-linear memory function monomials of different orders (from zero up to the chain order). Read More

A major challenge in network science is to determine whether an observed network property reveals some non-trivial behavior of the network's nodes, or if it is a consequence of the network's elementary properties. Statistical null models serve this purpose by producing random networks whilst keeping chosen network's properties fixed. While there is increasing interest in networks that evolve in time, we still lack a robust time-aware framework to assess the statistical significance of the observed structural properties of growing networks. Read More

The new event generator TWOPEG for the channel $e p \rightarrow e' p' \pi^{+} \pi^{-}$ has been developed. It uses an advanced method of event generation with weights and employs the five-fold differential structure functions from the recent versions of the JM model fit to all results on charged double pion photo- and electroproduction cross sections from CLAS (both published and preliminary). In the areas covered by measured CLAS data, TWOPEG successfully reproduces the available integrated and single-differential double pion cross sections. Read More

The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide extant records of human dynamics during and afterward the event. Read More

We consider the ASEP and the stochastic six vertex models started with step initial data. After a long time $T$ it is known that the one-point height function fluctuations for these systems are of order $T^{1/3}$. We prove the KPZ prediction of $T^{2/3}$ scaling in space. Read More

A continuous time random walk (CTRW) model with waiting times following the Levy-stable distribution with exponential cut-off in equilibrium is a simple theoretical model giving rise to normal, yet non-Gaussian diffusion. The distribution of the particle displacements is explicitly time-dependent and does not scale. Since fluorescent correlation spectroscopy (FCS) is often used to investigate diffusion processes, we discuss the influence of this lack of scaling on the possible outcome of the FCS measurements and calculate the FCS autocorrelation curves for such equilibrated CTRWs. Read More

We present a new method to locate the starting points in time of an arbitrary number of (damped) delayed signals. For a finite data sequence, the method permits to first locate the starting point of the component with the longest delay, and then --by iteration-- all the preceding ones. Numerical examples are given and noise sensitivity is tested for weak noise. Read More