Physics - Data Analysis; Statistics and Probability Publications (50)


Physics - Data Analysis; Statistics and Probability Publications

The syntactic structure of a sentence can be modelled as a tree, where vertices correspond to words and edges indicate syntactic dependencies. It has been claimed recurrently that the number of edge crossings in real sentences is small. However, a baseline or null hypothesis has been lacking. Read More

It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. Read More

The least-squares support vector machine is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the least-squares support vector machine classifier. The proposed methods are based on randomized block kernel matrices, and we show that they provide good accuracy and reliable scaling for multi-class classification problems with relatively large data sets. Read More

The main goal of the paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate a decomposition procedure for the conditional probability function of sequences considered as the high-order Markov chains. We represent the conditional probability function as the sum of multi-linear memory function monomials of different orders (from zero up to the chain order). Read More

A major challenge in network science is to determine whether an observed network property reveals some non-trivial behavior of the network's nodes, or if it is a consequence of the network's elementary properties. Statistical null models serve this purpose by producing random networks whilst keeping chosen network's properties fixed. While there is increasing interest in networks that evolve in time, we still lack a robust time-aware framework to assess the statistical significance of the observed structural properties of growing networks. Read More

The new event generator TWOPEG for the channel $e p \rightarrow e' p' \pi^{+} \pi^{-}$ has been developed. It uses an advanced method of event generation with weights and employs the five-fold differential structure functions from the recent versions of the JM model fit to all results on charged double pion photo- and electroproduction cross sections from CLAS (both published and preliminary). In the areas covered by measured CLAS data, TWOPEG successfully reproduces the available integrated and single-differential double pion cross sections. Read More

The most critical time for information to spread is in the aftermath of a serious emergency, crisis, or disaster. Individuals affected by such situations can now turn to an array of communication channels, from mobile phone calls and text messages to social media posts, when alerting social ties. These channels drastically improve the speed of information in a time-sensitive event, and provide extant records of human dynamics during and afterward the event. Read More

We consider the ASEP and the stochastic six vertex models started with step initial data. After a long time $T$ it is known that the one-point height function fluctuations for these systems are of order $T^{1/3}$. We prove the KPZ prediction of $T^{2/3}$ scaling in space. Read More

A continuous time random walk (CTRW) model with waiting times following the Levy-stable distribution with exponential cut-off in equilibrium is a simple theoretical model giving rise to normal, yet non-Gaussian diffusion. The distribution of the particle displacements is explicitly time-dependent and does not scale. Since fluorescent correlation spectroscopy (FCS) is often used to investigate diffusion processes, we discuss the influence of this lack of scaling on the possible outcome of the FCS measurements and calculate the FCS autocorrelation curves for such equilibrated CTRWs. Read More

We present a new method to locate the starting points in time of an arbitrary number of (damped) delayed signals. For a finite data sequence, the method permits to first locate the starting point of the component with the longest delay, and then --by iteration-- all the preceding ones. Numerical examples are given and noise sensitivity is tested for weak noise. Read More

Many problems in industry --- and in the social, natural, information, and medical sciences --- involve discrete data and benefit from approaches from subjects such as network science, information theory, optimization, probability, and statistics. Because the study of networks is concerned explicitly with connectivity between different entities, it has become very prominent in industrial settings, and this importance has been accentuated further amidst the modern data deluge. In this article, we discuss the role of network analysis in industrial and applied mathematics, and we give several examples of network science in industry. Read More

The Met Office Space Weather Operations Centre produces 24/7/365 space weather guidance, alerts, and forecasts to a wide range of government and commercial end users across the United Kingdom. Solar flare forecasts are one of its products, which are issued multiple times a day in two forms; forecasts for each active region on the solar disk over the next 24 hours, and full-disk forecasts for the next four days. Here the forecasting process is described in detail, as well as first verification of archived forecasts using methods commonly used in operational weather prediction. Read More

When taking the model error into account, one needs to evaluate the prior distribution (the Onsager-Machlup functional), which contains the divergence term difficult to be calculated for large systems. However, the Euler method for time discretization of the functional can eliminate the need for evaluating the divergence term. This property is of use for solving nonlinear data assimilation problems with sampling methods such as the Metropolis-adjusted Langevin algorithm. Read More

We assess the skill and reliability of forecasts of winter and summer temperature, wind speed and irradiance over China, using the GloSea5 seasonal forecast system. Skill in such forecasts is important for the future development of seasonal climate services for the energy sector, allowing better estimates of forthcoming demand and renewable electricity supply. We find that although overall the skill from the direct model output is patchy, some high-skill regions of interest to the energy sector can be identified. Read More

Every network scientist knows that preferential attachment combines with growth to produce networks with power-law in-degree distributions. So how, then, is it possible for the network of American Physical Society journal collection citations to enjoy a log-normal citation distribution when it was found to have grown in accordance with preferential attachment? This anomalous result, which we exalt as the preferential attachment paradox, has remained unexplained since the physicist Sidney Redner first made light of it over a decade ago. In this paper we propose a resolution to the paradox. Read More

q-Gaussian distribution appear in many science areas where we can find systems that could be described within a nonextensive framework. Usually, a way to assert that these systems belongs to nonextensive framework is by means of numerical data analysis. To this end, we implement random number generator for q-Gaussian distribution, while we present how to computing its probability density function, cumulative density function and quantile function besides a tail weight measurement using robust statistics. Read More

Modelling physical data with linear discrete time series, namely Fractionally Integrated Autoregressive Moving Average (ARFIMA), is a technique which achieved attention in recent years. However, these models are used mainly as a statistical tool only, with weak emphasis on physical background of the model. The main reason for this lack of attention is that ARFIMA model describes discrete-time measurements, whereas physical models are formulated using continuous-time parameter. Read More

The sensitivity of molecular dynamics on changes in the potential energy function plays an important role in understanding the dynamics and function of complex molecules.We present a method to obtain path ensemble averages of a perturbed dynamics from a set of paths generated by a reference dynamics. It is based on the concept of path probability measure and the Girsanov theorem, a result from stochastic analysis to estimate a change of measure of a path ensemble. Read More

Neural data analysis has increasingly incorporated causal information to study circuit connectivity. Dimensional reduction forms the basis of most analyses of large multivariate time series. Here, we present a new, multitaper-based decomposition for stochastic, multivariate time series that acts on the covariance of the time series at all lags, $C(\tau)$, as opposed to standard methods that decompose the time series, $\mathbf{X}(t)$, using only information at zero-lag. Read More

We have carried out a detailed study of scaling region using detrended fractal analysis test by applying different forcing likewise noise, sinusoidal, square on the floating potential fluctuations acquired under different pressures in a DC glow discharge plasma. The transition in the dynamics is observed through recurrence plot techniques which is an efficient method to observe the critical regime transitions in dynamics. The complexity of the nonlinear fluctuation has been revealed with the help of recurrence quantification analysis which is a suitable tool for investigating recurrence, an ubiquitous feature providing a deep insight into the dynamics of real dynamical system. Read More

We show that in a deep neural network trained with ReLU, the low-lying layers should be replaceable with truncated linearly activated layers. We derive the gradient descent equations in this truncated linear model and demonstrate that --if the distribution of the training data is stationary during training-- the optimal choice for weights in these low-lying layers is the eigenvectors of the covariance matrix of the data. If the training data is random and uniform enough, these eigenvectors can be found using a small fraction of the training data, thus reducing the computational complexity of training. Read More

We describe the development of a new software tool, called "Pomelo", for the calculation of Set Voronoi diagrams. Voronoi diagrams are a spatial partition of the space around the particles into separate Voronoi cells, e.g. Read More

We describe a strategy for constructing a neural network jet substructure tagger which powerfully discriminates boosted decay signals while remaining largely uncorrelated with the jet mass. This reduces the impact of systematic uncertainties in background modeling while enhancing signal purity, resulting in improved discovery significance relative to existing taggers. The network is trained using an adversarial strategy, resulting in a tagger that learns to balance classification accuracy with decorrelation. Read More

Let $\Omega_\epsilon$ be a metallic plate whose top inaccessible surface has been damaged by some chemical or mechanical agent. We heat the opposite side and collect a sequence of temperature maps $u^\epsilon$. Here, we construct a formal explicit approximation of the damage $\epsilon\theta$ by solving a nonlinear inverse problem for the heat equation in three steps: (i) smoothing of temperature maps, (ii) domain derivative of the temperature, (iii) thin plate approximation of the model and perturbation theory. Read More

Seismic data quality is vital to geophysical applications, so methods of data recovery, including denoising and interpolation, are common initial steps in the seismic data processing flow. We present a method to perform simultaneous interpolation and denoising, which is based on double-sparsity dictionary learning. This extends previous work that was for denoising only. Read More

In this work we use the Maximum Entropy Principle (MEP) to infer the mass of an axion which interacts to photons and neutrinos in an effective low energy theory. The Shannon entropy function to be maximized is suitably defined in terms of the axion branching ratios. We show that MEP strongly constrains the axion mass taking into account the current experimental bounds on the neutrinos masses. Read More

Here we report on a set of programs developed at the ZMBH Bio-Imaging Facility for tracking real-life images of cellular processes. These programs perform 1) automated tracking; 2) quantitative and comparative track analyses of different images in different groups; 3) different interactive visualization schemes; and 4) interactive realistic simulation of different cellular processes for validation and optimal problem-specific adjustment of image acquisition parameters (tradeoff between speed, resolution, and quality with feedback from the very final results). The collection of programs is primarily developed for the common bio-image analysis software ImageJ (as a single Java Plugin). Read More

The predictions of parameteric property models and their uncertainties are sensitive to systematic errors such as inconsistent reference data, parametric model assumptions, or inadequate computational methods. Here, we discuss the calibration of property models in the light of bootstrapping, a sampling method akin to Bayesian inference that can be employed for identifying systematic errors and for reliable estimation of the prediction uncertainty. We apply bootstrapping to assess a linear property model linking the $^{57}$Fe Mossbauer isomer shift to the contact electron density at the iron nucleus for a diverse set of 44 molecular iron compounds. Read More

Early and accurate identification of parkinsonian syndromes (PS) involving presynaptic degeneration from non-degenerative variants such as Scans Without Evidence of Dopaminergic Deficit (SWEDD) and tremor disorders, is important for effective patient management as the course, therapy and prognosis differ substantially between the two groups. In this study, we use Single Photon Emission Computed Tomography (SPECT) images from healthy normal, early PD and SWEDD subjects, as obtained from the Parkinson's Progression Markers Initiative (PPMI) database, and process them to compute shape- and surface fitting-based features for the three groups. We use these features to develop and compare various classification models that can discriminate between scans showing dopaminergic deficit, as in PD, from scans without the deficit, as in healthy normal or SWEDD. Read More

Transitions between multiple stable states of nonlinear systems are ubiquitous in physics, chemistry, and beyond. Two types of behaviors are usually seen as mutually exclusive: unpredictable noise-induced transitions and predictable bifurcations of the underlying vector field. Here, we report a new situation, corresponding to a fluctuating system approaching a bifurcation, where both effects collaborate. Read More

For many profilometry techniques, phase unwrapping is one of the most challenging process. In order to sidestep the phase unwrapping process, Perciante et. al [Appl Opt 2015; 54(10):3018-23] proposed a wrapping-free method based on the direct integration of the spatial derivatives of the patterns to retrieve the phase. Read More

The presence of multiple candidates per event can be a cause of biases which are large compared to statistical uncertainties. Selecting a single candidate is common practice but only helps if the likelihood of selecting the true candidate is very high. Otherwise, the precision of the measurement can be affected, and additional biases can be generated, even if none are present in the data sample prior to this operation. Read More

In this paper, the calorimetric power measurement method for electron cyclotron resonance heating system on EAST are presented. This method requires measurement of the water flow through the cooling circuits and the input and output water temperatures in each cooling circuit. Usually, the inlet water temperature is controlled to be stable to get more accurate results. Read More

Image resolvability is the primary concern in imaging. This paper reports an estimation of the full width at half maximum of the point spread function from a Fourier domain plot of real sample images by neither using test objects, nor defining a threshold criterion. We suggest that this method can be applied to any type of image, independently of the imaging modality. Read More

A Bayesian approach is proposed for pulse shape discrimination of photons and neutrons in liquid organic scinitillators. Instead of drawing a decision boundary, each pulse is assigned a photon or neutron confidence probability. This allows for photon and neutron classification on an event-by-event basis. Read More

A fast physics analysis framework has been developed based on SNiPER to process the increasingly large data sample collected by BESIII. In this framework, a reconstructed event data model with SmartRef is designed to improve the speed of Input/Output operations, and necessary physics analysis tools are migrated from BOSS to SNiPER. A real physics analysis $e^{+}e^{-} \rightarrow \pi^{+}\pi^{-}J/\psi$ is used to test the new framework, and achieves a factor of 10. Read More

A novel single-lead f-wave extraction algorithm based on the modern diffusion geometry data analysis framework is proposed. The algorithm is essentially an averaged beat subtraction algorithm, where the ventricular activity template is estimated by combining a newly designed metric, the diffusion distance, and the non-local Euclidean median based on the non-linear manifold setup. To validate the algorithm, two simulation schemes are proposed and tested, and state-of-the-art results are reported. Read More

We propose a 2D generalization to the $M$-band case of the dual-tree decomposition structure (initially proposed by N. Kingsbury and further investigated by I. Selesnick) based on a Hilbert pair of wavelets. Read More

Dynamical equations describing physical systems at statistical equilibrium are commonly extended by mathematical tools called "thermostats". These tools are designed for sampling ensembles of statistical mechanics. We propose a dynamic principle for derivation of stochastic and deterministic thermostats. Read More

In this paper we propose a 'knee-like' approximation of the lateral distribution of the Cherenkov light from extensive air showers in the energy range 30-3000 TeV and study a possibility of its practical application in high energy ground-based gamma-ray astronomy experiments (in particular, in TAIGA-HiSCORE). The approximation has a very good accuracy for individual showers and can be easily simplified for practical application in the HiSCORE wide angle timing array in the condition of a limited number of triggered stations. Read More

This work is a methodical study of another option of the hybrid method originally aimed at gamma/hadron separation in the TAIGA experiment. In the present paper this technique was performed to distinguish between different mass groups of cosmic rays in the energy range 200 TeV - 500 TeV. The study was based on simulation data of TAIGA prototype and included analysis of geometrical form of images produced by different nuclei in the IACT simulation as well as shower core parameters reconstructed using timing array simulation. Read More

A concise derivation of the "Joel equations", which allow for the determination of the axis angle 2V from measurements of extinction directions on a spindle stage, is provided starting from the wave-equation. Only analytic methods and no geometric arguments referring to stereographic projections are invoked. For error free data, the resulting equations allow for a closed form solution. Read More

A fractal bears a complex structure that is reflected in a scaling hierarchy of far more small things than large ones. This scaling hierarchy can be effectively derived by head/tail breaks - a classification scheme for data with a heavy-tailed distribution, and be quantified by ht-index - a head/tail breaks induced integer. This paper refines the ht-index as a fraction with which to measure the scaling hierarchy of a fractal more precisely within a whole, and further assigns a fractional ht-index to an individual data value of a data series that represents the fractal. Read More

Comprehensive Two dimensional gas chromatography (GCxGC) plays a central role into the elucidation of complex samples. The automation of the identification of peak areas is of prime interest to obtain a fast and repeatable analysis of chromatograms. To determine the concentration of compounds or pseudo-compounds, templates of blobs are defined and superimposed on a reference chromatogram. Read More

The family of visibility algorithms were recently introduced as mappings between time series and graphs. Here we extend this method to characterize spatially extended data structures by mapping scalar fields of arbitrary dimension into graphs. After introducing several possible extensions, we provide analytical results on some topological properties of these graphs associated to some types of real-valued matrices, which can be understood as the high and low disorder limits of real-valued scalar fields. Read More

A simple 'knee-like' approximation of the Lateral Distribution Function (LDF) of Cherenkov light emitted by EAS (extensive air showers) in the atmosphere is proposed for solving various tasks of data analysis in HiSCORE and other wide angle ground-based experiments designed to detect gamma rays and cosmic rays with the energy above tens of TeV. Simulation-based parametric analysis of individual LDF curves revealed that on the radial distance 20-500 m the 5-parameter 'knee-like' approximation fits individual LDFs as well as the mean LDF with a very good accuracy. In this paper we demonstrate the efficiency and flexibility of the 'knee-like' LDF approximation for various primary particles and shower parameters and the advantages of its application to suppressing proton background and selecting primary gamma rays. Read More

This work is a methodical study on hybrid reconstruction techniques for hybrid imaging/timing Cherenkov observations. This type of hybrid array is to be realized at the gamma-observatory TAIGA intended for very high energy gamma-ray astronomy (>30 TeV). It aims at combining the cost-effective timing-array technique with imaging telescopes. Read More

A 'knee-like' approximation of Cherenkov light Lateral Distribution Functions, which we developed earlier, now is used for the actual tasks of background rejection methods for high energy (tens and hundreds of TeV) gamma-ray astronomy. In this work we implement this technique to the HiSCORE wide angle timing array consisting of Cherenkov light detectors with spacing of 100 m covering 0.2 km$^2$ presently and up to 5 km$^2$ in future. Read More

Most of the time series in nature are a mixture of signals with deterministic and random dynamics. Thus the distinction between these two characteristics becomes important. Distinguishing between chaotic and aleatory signals is difficult because they have a common wide-band power spectrum, a delta-like autocorrelation function, and share other features as well. Read More