Quantitative Biology - Quantitative Methods Publications (50)


Quantitative Biology - Quantitative Methods Publications

Motivation: Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such normalization of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways. Results: First, we study how the outcome of hypothesis tests for differential metabolite concentration is affected by the choice of scale. Read More

Camera-traps is a relatively new but already popular instrument in the estimation of abundance of non-identifiable animals. Although camera-traps are convenient in application, there remain both theoretical complications such as spatial autocorrelation or false negative problem and practical difficulties, for example, laborious random sampling. In the article we propose an alternative way to bypass the mentioned problems. Read More

Motivation: We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. Read More

This paper introduces a nonparametric copula-based approach for detecting the strength and monotonicity of linear and nonlinear statistical dependence between bivariate continuous, discrete or hybrid random variables and stochastic signals, termed CIM. We show that CIM satisfies the data processing inequality and is consequently a self-equitable metric. Simulation results using synthetic datasets reveal that the CIM compares favorably to other state-of-the-art statistical dependency metrics, including the Maximal Information Coefficient (MIC), Randomized Dependency Coefficient (RDC), distance Correlation (dCor), Copula correlation (Ccor), and Copula Statistic (CoS) in both statistical power and sample size requirements. Read More

Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is fundamental for the study of gene expression, especially at the molecular level. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of gene expression, we reveal a quantitative relation between stochastic fluctuations and feedback regulation at the single-molecule level, which provides novel insights into how and to what extent a feedback loop can enhance or suppress molecular fluctuations. Based on such relation, we also develop an effective method to extract the topological information of gene regulatory networks from single-cell gene expression data. Read More

Many species of millimetric fungus-harvesting termites collectively build uninhabited, massive mound structures enclosing a network of broad tunnels which protrude from the ground meters above their subterranean nests. It is widely accepted that the purpose of these mounds is to give the colony a controlled micro-climate in which to raise fungus and brood by managing heat, humidity, and respiratory gas exchange. While different hypotheses such as steady and fluctuating external wind and internal metabolic heating have been proposed for ventilating the mound, the absence of direct in-situ measurement of internal air flows has precluded a definitive mechanism for this critical physiological function. Read More

We introduce SIM-CE, an advanced, user-friendly modeling and simulation environment in Simulink for performing multi-scale behavioral analysis of the nervous system of Caenorhabditis elegans (C. elegans). SIM-CE contains an implementation of the mathematical models of C. Read More

Caenorhabditis elegans (C. elegans) illustrated remarkable behavioral plasticities including complex non-associative and associative learning representations. Understanding the principles of such mechanisms presumably leads to constructive inspirations for the design of efficient learning algorithms. Read More

Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable inferences. One of the major challenges in phylogenetics regards the estimation of the marginal likelihood. Read More

To understand the nature of a cell, one needs to understand the structure of its genome. For this purpose, experimental techniques such as Hi-C detecting chromosomal contacts are used to probe the three-dimensional genomic structure. These experiments yield topological information, consistently showing a hierarchical subdivision of the genome into self-interacting domains across many organisms. Read More

The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Read More

The increasing capacity of high-throughput genomic technologies for generating time-course data has stimulated a rich debate on the most appropriate methods to highlight crucial aspects of data structure. In this work, we address the problem of sparse co-expression network representation of several time-course stress responses in {\it Saccharomyces cerevisiae}. We quantify the information preserved from the original datasets under a graph-theoretical framework and evaluate how cross-stress features can be identified. Read More

In scientific literature, there are many programs that predict linear B-cell epitopes from a protein sequence. Each program generates multiple B-cell epitopes that can be individually studied. This paper defines a function called that combines results from five different prediction programs concerning the linear B-cell epitopes (ie. Read More

Background: It is necessary and essential to discovery protein function from the novel primary sequences. Wet lab experimental procedures are not only time-consuming, but also costly, so predicting protein structure and function reliably based only on amino acid sequence has significant value. TATA-binding protein (TBP) is a kind of DNA binding protein, which plays a key role in the transcription regulation. Read More

In a conformational nonequilibrium steady state (cNESS), enzyme turnover is modulated by the underlying conformational dynamics. Based on a discrete kinetic network model, we use the integrated population flux balance method to derive the cNESS turnover rate for a conformation-modulated enzymatic reaction. The traditional Michaelis-Menten (MM) rate equation is extended to a generalized form, which includes non-MM corrections induced by conformational population currents within combined cyclic kinetic loops. Read More

Calcium imaging has emerged as a workhorse method in neuroscience to investigate patterns of neuronal activity. Instrumentation to acquire calcium imaging movies has rapidly progressed and has become standard across labs. Still, algorithms to automatically detect and extract activity signals from calcium imaging movies are highly variable from~lab~to~lab and more advanced algorithms are continuously being developed. Read More

The state-of-the-art method for automatically segmenting white matter bundles in diffusion-weighted MRI is tractography in conjunction with streamline cluster selection. This process involves long chains of processing steps which are not only computationally expensive but also complex to setup and tedious with respect to quality control. Direct bundle segmentation methods treat the task as a traditional image segmentation problem. Read More

Electrical impedance tomography (EIT) provides functional images of an electrical conductivity distribution inside the human body. Since the 1980s, many potential clinical applications have arisen using inexpensive portable EIT devices. EIT acquires multiple trans-impedance measurements across the body from an array of surface electrodes around a chosen imaging slice. Read More

In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i. Read More

Aim: To develop a generalised flow cytometry protocol that uses reference beads for the enumeration of live and dead bacteria present in a mixture. Methods and Results: Mixtures of live and dead Escherichia coli with live:dead concentration ratios varying from 0 to 100% were prepared. These samples were stained using SYTO 9 and propidium iodide and 6 {\mu}m beads reference beads were added. Read More

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is essentially deterministic without specifying initial centers, etc. Read More

Indexing massive data sets is extremely expensive for large scale problems. In many fields, huge amounts of data are currently generated, however extracting meaningful information from voluminous data sets, such as computing similarity between elements, is far from being trivial. It remains nonetheless a fundamental need. Read More

High hydrostatic pressure is commonly encountered in many environments, but the effects of high pressure on eukaryotic cells have been understudied. To understand the effects of hydrostatic pressure in the model eukaryote, Saccharomyces cerevisiae, we have performed quantitative experiments of cell division, cell morphology, and cell death under a wide range of pressures. We developed an automated image analysis method for quantification of the yeast budding index - a measure of cell cycle state - as well as a continuum model of budding to investigate the effect of pressure on cell division and cell morphology. Read More

Most of the calcium in the body is stored in bone. The rest is stored elsewhere, and calcium signalling is one of the most important mechanisms of information propagation in the body. Yet, many questions remain open. Read More

We quantify the amount of regulation required to control growth in living cells by a Maximum Entropy approach to the space of underlying metabolic states described by genome-scale models. Results obtained for E. coli and human cells are consistent with experiments and point to different regulatory strategies by which growth can be fostered or repressed. Read More

While we once thought of cancer as single monolithic diseases affecting a specific organ site, we now understand that there are many subtypes of cancer defined by unique patterns of gene mutations. These gene mutational data, which can be more reliably obtained than gene expression data, help to determine how the subtypes develop, evolve, and respond to therapies. Different from dense continuous-value gene expression data, which most existing cancer subtype discovery algorithms use, somatic mutational data are extremely sparse and heterogeneous, because there are less than 0. Read More

We study a generic one-dimensional model for an intracellular cargo driven by N motor proteins against an external applied force. The model includes motor-cargo and motor-motor interactions. The cargo motion is described by an over-damped Langevin equation, while motor dynamics is specified by hopping rates which follow a local detailed balance condition with respect to change in energy per hopping event. Read More

A novel single-lead f-wave extraction algorithm based on the modern diffusion geometry data analysis framework is proposed. The algorithm is essentially an averaged beat subtraction algorithm, where the ventricular activity template is estimated by combining a newly designed metric, the diffusion distance, and the non-local Euclidean median based on the non-linear manifold setup. To validate the algorithm, two simulation schemes are proposed and tested, and state-of-the-art results are reported. Read More

As South and Central American countries prepare for increased birth defects from Zika virus outbreaks and plan for mitigation strategies to minimize ongoing and future outbreaks, understanding important characteristics of Zika outbreaks and how they vary across regions is a challenging and important problem. We developed a mathematical model for the 2015 Zika virus outbreak dynamics in Colombia, El Salvador, and Suriname. We fit the model to publicly available data provided by the Pan American Health Organization, using Approximate Bayesian Computation to estimate parameter distributions and provide uncertainty quantification. Read More

Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). Read More

Advances in synthetic biology allow us to engineer bacterial collectives with pre-specified characteristics. However, the behavior of these collectives is difficult to understand, as cellular growth and division as well as extra-cellular fluid flow lead to complex, changing arrangements of cells within the population. To rationally engineer and control the behavior of cell collectives we need theoretical and computational tools to understand their emergent spatiotemporal dynamics. Read More

Optimal subset selection is an important task that has numerous algorithms designed for it and has many application areas. STPGA contains a special genetic algorithm supplemented with a tabu memory property (that keeps track of previously tried solutions and their fitness for a number of iterations), and with a regression of the fitness of the solutions on their coding that is used to form the ideal estimated solution (look ahead property) to search for solutions of generic optimal subset selection problems. I have initially developed the programs for the specific problem of selecting training populations for genomic prediction or association problems, therefore I give discussion of the theory behind optimal design of experiments to explain the default optimization criteria in STPGA, and illustrate the use of the programs in this endeavor. Read More

Stochastic exponential growth is observed in a variety of contexts, including molecular autocatalysis, nuclear fission, population growth, inflation of the universe, viral social media posts, and financial markets. Yet literature on modeling the phenomenology of these stochastic dynamics has predominantly focused on one model, Geometric Brownian Motion (GBM), which can be described as the solution of a Langevin equation with linear drift and linear multiplicative noise. Using recent experimental results on stochastic exponential growth of individual bacterial cell sizes, we motivate the need for a more general class of phenomenological models of stochastic exponential growth, which are consistent with the observation that the mean-rescaled distributions are approximately stationary at long times. Read More

There are several indications that brain is organized not on a basis of individual unreliable neurons, but on a micro-circuital scale providing Lego blocks employed to create complex architectures. At such an intermediate scale, the firing activity in the microcircuits is governed by collective effects emerging by the background noise soliciting spontaneous firing, the degree of mutual connections between the neurons, and the topology of the connections. We compare spontaneous firing activity of small populations of neurons adhering to an engineered scaffold with simulations of biologically plausible CMOS artificial neuron populations whose spontaneous activity is ignited by tailored background noise. Read More

Estimating vaccination uptake is an integral part of ensuring public health. It was recently shown that vaccination uptake can be estimated automatically from web data, instead of slowly collected clinical records or population surveys. All prior work in this area assumes that features of vaccination uptake collected from the web are temporally regular. Read More

The method of biomass estimation based on a volume-to-biomass relationship has been applied in estimating forest biomass conventionally through the mean volume (m3 ha-1). However, few studies have been reported concerning the verification of the volume-biomass equations regressed using field data. The possible bias may result from the volume measurements and extrapolations from sample plots to stands or a unit area. Read More

The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. Read More

Consider the problem of modeling hysteresis for finite-state random walks using higher-order Markov chains. This Letter introduces a Bayesian framework to determine, from data, the number of prior states of recent history upon which a trajectory is statistically dependent. The general recommendation is to use leave-one-out cross validation, using an easily-computable formula that is provided in closed form. Read More

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24. Read More

Boolean matrix factorisation aims to decompose a binary data matrix into an approximate Boolean product of two low rank, binary matrices: one containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for Boolean matrix factorisation and derive a Metropolised Gibbs sampler that facilitates efficient parallel posterior inference. On real world and simulated data, our method outperforms all currently existing approaches for Boolean matrix factorisation and completion. Read More

Parametric imaging is a compartmental approach that processes nuclear imaging data to estimate the spatial distribution of the kinetic parameters governing tracer flow. The present paper proposes a novel and efficient computational method for parametric imaging which is potentially applicable to several compartmental models of diverse complexity and which is effective in the determination of the parametric maps of all kinetic coefficients. We consider applications to [{18}F]-fluorodeoxyglucose Positron Emission Tomography (FDG--PET) data and analyze the two--compartment catenary model describing the standard FDG metabolization by an homogeneous tissue and the three--compartment non--catenary model representing the renal physiology. Read More

Based on a set of subjects and a collection of descriptors obtained from the Alzheimer's Disease Neuroimaging Initiative database, we use redescription mining to find rules revealing associations between these determinants which provides insights about the Alzheimer's disease (AD). We applied a four-step redescription mining algorithm (CLUS-RM), which has been extended to engender constraint-based redescription mining (CBRM) and enables several modes of targeted exploration of specific, user-defined associations. To a large extent we confirmed known findings, previously reported in the literature. Read More

The stochastic dynamics of networks of biochemical reactions in living cells are typically modelled using chemical master equations (CMEs). The stationary distributions of CMEs are seldom solvable analytically, and few methods exist that yield numerical estimates with computable error bounds. Here, we present two such methods based on mathematical programming techniques. Read More

Low grade gliomas (LGGs) are infiltrative and incurable primary brain tumours with typically slow evolution. These tumours usually occur in young and otherwise healthy patients, bringing controversies in treatment planning since aggressive treatment may lead to undesirable side effects. Thus, for management decisions it would be valuable to obtain early estimates of LGG growth potential. Read More

Heparan sulfate (HS) is a linear, polydisperse sulfated polysaccharide belonging to the glycosaminoglycan family. HS proteoglycans are ubiquitously found at the cell surface and extracellular matrix in animal species. HS is involved in the interaction with a wide variety of proteins and the regulation of many biological activities. Read More

We study the population size time series of a Neotropical small mammal with the intent of detecting and modelling population regulation processes generated by density-dependent factors and their possible delayed effects. The application of analysis tools based on principles of statistical generality are nowadays a common practice for describing these phenomena, but, in general, they are more capable of generating clear diagnosis rather than granting valuable modelling. For this reason, in our approach, we detect the principal temporal structures on the bases of different correlation measures, and from these results we build an ad-hoc minimalist autoregressive model that incorporates the main drivers of the dynamics. Read More

This paper analyses an SIRS-type model for infectious diseases with account for behavioural changes associated with the simultaneous spread of awareness in the population. Two types of awareness are included into the model: private awareness associated with direct contacts between unaware and aware populations, and public information campaign. Stability analysis of different steady states in the model provides information about potential spread of disease in a population, and well as about how the disease dynamics is affected by the two types of awareness. Read More

Recently, Eklund et al. (2016) analyzed clustering methods in standard FMRI packages: AFNI (which we maintain), FSL, and SPM [1]. They claimed: 1) false positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; 2) nonparametric methods produce valid, but slightly conservative, FPRs; 3) a common flawed assumption is that the spatial autocorrelation function (ACF) of FMRI noise is Gaussian-shaped; and 4) a 15-year-old bug in AFNI's 3dClustSim significantly contributed to producing "particularly high" FPRs compared to other software. Read More