Quantitative Biology - Quantitative Methods Publications (50)


Quantitative Biology - Quantitative Methods Publications

Estimating vaccination uptake is an integral part of ensuring public health. It was recently shown that vaccination uptake can be estimated automatically from web data, instead of slowly collected clinical records or population surveys. All prior work in this area assumes that features of vaccination uptake collected from the web are temporally regular. Read More

The method of biomass estimation based on a volume-to-biomass relationship has been applied in estimating forest biomass conventionally through the mean volume (m3 ha-1). However, few studies have been reported concerning the verification of the volume-biomass equations regressed using field data. The possible bias may result from the volume measurements and extrapolations from sample plots to stands or a unit area. Read More

The event-based model (EBM) for data-driven disease progression modeling estimates the sequence in which biomarkers for a disease become abnormal. This helps in understanding the dynamics of disease progression and facilitates early diagnosis by staging patients on a disease progression timeline. Existing EBM methods are all generative in nature. Read More

Consider the problem of modeling hysteresis for finite-state random walks using higher-order Markov chains. This Letter introduces a Bayesian framework to determine, from data, the number of prior states of recent history upon which a trajectory is statistically dependent. The general recommendation is to use leave-one-out cross validation, using an easily-computable formula that is provided in closed form. Read More

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. Typically, these systems operate on domain-specific fractions of public medical data. MOLIERE, in contrast, utilizes information from over 24. Read More

Boolean matrix factorisation (BooMF) infers interpretable decompositions of a binary data matrix into a pair of low-rank, binary matrices: One containing meaningful patterns, the other quantifying how the observations can be expressed as a combination of these patterns. We introduce the OrMachine, a probabilistic generative model for BooMF and derive a Metropolised Gibbs sampler that facilitates very efficient parallel posterior inference. Our method outperforms all currently existing approaches for Boolean Matrix factorization and completion, as we show on simulated and real world data. Read More

Parametric imaging is a compartmental approach that processes nuclear imaging data to estimate the spatial distribution of the kinetic parameters governing tracer flow. The present paper proposes a novel and efficient computational method for parametric imaging which is potentially applicable to several compartmental models of diverse complexity and which is effective in the determination of the parametric maps of all kinetic coefficients. We consider applications to [{18}F]-fluorodeoxyglucose Positron Emission Tomography (FDG--PET) data and analyze the two--compartment catenary model describing the standard FDG metabolization by an homogeneous tissue and the three--compartment non--catenary model representing the renal physiology. Read More

Based on a set of subjects and a collection of descriptors obtained from the Alzheimer's Disease Neuroimaging Initiative database, we use redescription mining to find rules revealing associations between these determinants which provides insights about the Alzheimer's disease (AD). We applied a four-step redescription mining algorithm (CLUS-RM), which has been extended to engender constraint-based redescription mining (CBRM) and enables several modes of targeted exploration of specific, user-defined associations. To a large extent we confirmed known findings, previously reported in the literature. Read More

The stochastic dynamics of networks of biochemical reactions in living cells are typically modelled using chemical master equations (CMEs). The stationary distributions of CMEs are seldom solvable analytically, and few methods exist that yield numerical estimates with computable error bounds. Here, we present two such methods based on mathematical programming techniques. Read More

Low grade gliomas (LGGs) are infiltrative and incurable primary brain tumours with typically slow evolution. These tumours usually occur in young and otherwise healthy patients, bringing controversies in treatment planning since aggressive treatment may lead to undesirable side effects. Thus, for management decisions it would be valuable to obtain early estimates of LGG growth potential. Read More

Heparan sulfate (HS) is a linear, polydisperse sulfated polysaccharide belonging to the glycosaminoglycan family. HS proteoglycans are ubiquitously found at the cell surface and extracellular matrix in animal species. HS is involved in the interaction with a wide variety of proteins and the regulation of many biological activities. Read More

We study the population size time series of a Neotropical small mammal with the intent of detecting and modelling population regulation processes generated by density-dependent factors and their possible delayed effects. The application of analysis tools based on principles of statistical generality are nowadays a common practice for describing these phenomena, but, in general, they are more capable of generating clear diagnosis rather than granting valuable modelling. For this reason, in our approach, we detect the principal temporal structures on the bases of different correlation measures, and from these results we build an ad-hoc minimalist autoregressive model that incorporates the main drivers of the dynamics. Read More

This paper analyses an SIRS-type model for infectious diseases with account for behavioural changes associated with the simultaneous spread of awareness in the population. Two types of awareness are included into the model: private awareness associated with direct contacts between unaware and aware populations, and public information campaign. Stability analysis of different steady states in the model provides information about potential spread of disease in a population, and well as about how the disease dynamics is affected by the two types of awareness. Read More

Recently, Eklund et al. (2016) analyzed clustering methods in standard FMRI packages: AFNI (which we maintain), FSL, and SPM [1]. They claimed: 1) false positive rates (FPRs) in traditional approaches are greatly inflated, questioning the validity of "countless published fMRI studies"; 2) nonparametric methods produce valid, but slightly conservative, FPRs; 3) a common flawed assumption is that the spatial autocorrelation function (ACF) of FMRI noise is Gaussian-shaped; and 4) a 15-year-old bug in AFNI's 3dClustSim significantly contributed to producing "particularly high" FPRs compared to other software. Read More

Recent reports of inflated false positive rates (FPRs) in FMRI group analysis tools by Eklund et al. (2016) have become a large topic within (and outside) neuroimaging. They concluded that: existing parametric methods for determining statistically significant clusters had greatly inflated FPRs ("up to 70%," mainly due to the faulty assumption that the noise spatial autocorrelation function is Gaussian- shaped and stationary), calling into question potentially "countless" previous results; in contrast, nonparametric methods, such as their approach, accurately reflected nominal 5% FPRs. Read More

Magnetic resonance spectroscopy is universally regarded as one of the most important tools in chemical and bio-medical research. However, sensitivity limitations typically restrict imaging resolution to length scales greater than 10 \mu m. Here we bring quantum control to the detection of chemical systems to demonstrate high resolution electron spin imaging using the quantum properties of an array of nitrogen-vacancy (NV) centres in diamond. Read More

This work aimed, to determine the characteristics of activity series from fractal geometry concepts application, in addition to evaluate the possibility of identifying individuals with fibromyalgia. Activity level data were collected from 27 healthy subjects and 27 fibromyalgia patients, with the use of clock-like devices equipped with accelerometers, for about four weeks, all day long. The activity series were evaluated through fractal and multifractal methods. Read More

Hidden Markov models (HMMs) are commonly used to model animal movement data and infer aspects of animal behavior. An HMM assumes that each data point from a time series of observations stems from one of $N$ possible states. The states are loosely connected to behavioral modes that manifest themselves at the temporal resolution at which observations are made. Read More

Rapid experimental advances now enable simultaneous electrophysiological recording of neural activity at single-cell resolution across large regions of the nervous system. Models of this neural network activity will necessarily increase in size and complexity, thus increasing the computational cost of simulating them and new challenges in analyzing them. Here we present a novel approximation method to approximate the activity and firing statistics of a general firing rate network model (of Wilson-Cowan type) subject to noisy correlated background inputs. Read More

Human movements are physical processes combining the classical mechanics of the human body moving in space and the biomechanics of the muscles generating the forces acting on the body under sophisticated sensory-motor control. The characterization of the performance of human movements is a problem with important applications in clinical and sports research. One way to characterize movement performance is through measures of energy efficiency that relate the mechanical energy of the body and metabolic energy expended by the muscles. Read More

Neuroscientists are actively pursuing high-precision maps, or graphs, consisting of networks of neurons and connecting synapses in mammalian and non-mammalian brains. Such graphs, when coupled with physiological and behavioral data, are likely to facilitate greater understanding of how circuits in these networks give rise to complex information processing capabilities. Given that the automated or semi-automated methods required to achieve the acquisition of these graphs are still evolving, we develop a metric for measuring the performance of such methods by comparing their output with those generated by human annotators ("ground truth" data). Read More

This paper presents a monotonicity-based spatiotemporal conductivity imaging method for continuous regional lung monitoring using electrical impedance tomography (EIT). The EIT data (i.e. Read More

A Convolutional Neural Network was used to predict kidney function in patients with chronic kidney disease from high-resolution digital pathology scans of their kidney biopsies. Kidney biopsies were taken from participants of the NEPTUNE study, a longitudinal cohort study whose goal is to set up infrastructure for observing the evolution of 3 forms of idiopathic nephrotic syndrome, including developing predictors for progression of kidney disease. The knowledge of future kidney function is desirable as it can identify high-risk patients and influence treatment decisions, reducing the likelihood of irreversible kidney decline. Read More

Machine learning has been gaining traction in recent years to meet the demand for tools that can efficiently analyze and make sense of the ever-growing databases of biomedical data in health care systems around the world. However, effectively using machine learning methods requires considerable domain expertise, which can be a barrier of entry for bioinformaticians new to computational data science methods. Therefore, off-the-shelf tools that make machine learning more accessible can prove invaluable for bioinformaticians. Read More

Obtaining meaningful quantitative descriptions of the statistical dependence within multivariate systems is a difficult open problem. Recently, the Partial Information Decomposition (PID) was proposed to decompose mutual information (MI) about a target variable into components which are redundant, unique and synergistic within different subsets of predictor variables. Here, we propose to apply the elegant formalism of the PID to multivariate entropy, resulting in a Partial Entropy Decomposition (PED). Read More

Many contemporary statistical learning methods assume a Euclidean feature space, however, the "curse of dimensionality" associated with high feature dimensions is particularly severe for the Euclidean distance. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been proposed and tested by \citet{Lafferty:2015uy}, demonstrating promising results based on an approximate heat kernel, however, the exact hyperspherical heat kernel hitherto remains unknown. Read More

Random network models play a prominent role in modeling, analyzing and understanding complex phenomena on real-life networks. However, a key property of networks is often neglected: many real-world networks exhibit spatial structure, the tendency of a node to select neighbors with a probability depending on physical distance. Here, we introduce a class of random spatial networks (RSNs) which generalizes many existing random network models but adds spatial structure. Read More

Motivation: Epigenetic heterogeneity within a tumour can play an important role in tumour evolution and the emergence of resistance to treatment. It is increasingly recognised that the study of DNA methylation (DNAm) patterns along the genome -- so-called `epialleles' -- offers greater insight into epigenetic dynamics than conventional analyses which examine DNAm marks individually. Results: We have developed a Bayesian model to infer which epialleles are present in multiple regions of the same tumour. Read More

We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components. Read More

When setting up field experiments, to test and compare a range of genotypes (e.g. maize hybrids), it is important to account for any possible field effect that may otherwise bias performance estimates of genotypes. Read More

In 2015 the US federal government sponsored a dengue forecasting competition using historical case data from Iquitos, Peru and San Juan, Puerto Rico. Competitors were evaluated on several aspects of out-of-sample forecasts including the targets of peak week, peak incidence during that week and total season incidence across each of several seasons. Our team was one of the top performers of that competition, outperforming all other teams in multiple targets/locals. Read More

Inferring and comparing complex, multivariable probability density functions is a fundamental problem in several fields, including probabilistic learning, network theory, and data analysis. Classification and prediction are the two faces of this class of problem. We take an approach here that simplifies many aspects of these problems by presenting a structured series expansion of the Kullback-Leibler divergence - a function central to information theory - and devise a distance metric based on this divergence. Read More

In this paper, we show that many structured epidemic models may be described using a straightforward product structure. Such products, derived from products of directed graphs, may represent useful refinements including geographic and demographic structure, age structure, gender, risk groups, or immunity status. Extension to multi-strain dynamics, i. Read More

We consider evolving networks in which each node can have various associated properties (a state) in addition to those that arise from network structure. For example, each node can have a spatial location and a velocity, or some more abstract internal property that describes something like social trait. Edges between nodes are created and destroyed, and new nodes enter the system. Read More

We discuss the notorious problem of order selection in hidden Markov models, i.e. of selecting an adequate number of states, highlighting typical pitfalls and practical challenges arising when analyzing real data. Read More

Networks describe a range of social, biological and technical phenomena. An important property of a network is its degree correlation or assortativity, describing how nodes in the network associate based on their number of connections. Social networks are typically thought to be distinct from other networks in being assortative (possessing positive degree correlations); well-connected individuals associate with other well-connected individuals, and poorly-connected individuals associate with each other. Read More

Motivation: Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations make the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Read More

As high-throughput biological sequencing becomes faster and cheaper, the need to extract useful information from sequencing becomes ever more paramount, often limited by low-throughput experimental characterizations. For proteins, accurate prediction of their functions directly from their primary amino-acid sequences has been a long standing challenge. Here, machine learning using artificial recurrent neural networks (RNN) was applied towards classification of protein function directly from primary sequence without sequence alignment, heuristic scoring or feature engineering. Read More

We introduce a new family of minmax rank aggregation problems under two distance measures, the Kendall {\tau} and the Spearman footrule. As the problems are NP-hard, we proceed to describe a number of constant-approximation algorithms for solving them. We conclude with illustrative applications of the aggregation methods on the Mallows model and genomic data. Read More

Salmon farming has become a prosperous international industry over the last decades. Along with growth in the production farmed salmon, however, an increasing threat by pathogens has emerged. Of special concern is the propagation and spread of the salmon louse, Lepeophtheirus salmonis. Read More

We introduce a low dimensional function of the site frequency spectrum that is tailor-made for distinguishing coalescent models with multiple mergers from Kingman coalescent models with population growth, and use this function to construct a hypothesis test between these two model classes. The null and alternative sampling distributions of our statistic are intractable, but its low dimensionality renders these distributions amenable to Monte Carlo estimation. We construct kernel density estimates of the sampling distributions based on simulated data, and show that the resulting hypothesis test dramatically improves on the statistical power of a current state-of-the-art method. Read More

Virus binding to a surface results at least locally, at the contact area, in stress and potential structural perturbation of the virus cage. Here we address the question of the role of substrate-induced deformation in the overall virus mechanical response to the adsorption event. This question may be especially important for the broad category of viruses that have their shells stabilized by weak, non-covalent interactions. Read More

Amino-acid substitutions are implicated in a wide range of human diseases, many of which are lethal. Distinguishing such mutations from polymorphisms without significant effect on human health is a necessary step in understanding the etiology of such diseases. Computational methods can be used to select interesting mutations within a larger set, to corroborate experimental findings and to elucidate the cause of the deleterious effect. Read More

We describe here the recent results of a multidisciplinary effort to design a biomarker that can actively and continuously decode the progressive changes in neuronal organization leading to epilepsy, a process known as epileptogenesis. Using an animal model of acquired epilepsy, wechronically record hippocampal evoked potentials elicited by an auditory stimulus. Using a set of reduced coordinates, our algorithm can identify universal smooth low-dimensional configurations of the auditory evoked potentials that correspond to distinct stages of epileptogenesis. Read More

During embryogenesis tissue layers continuously rearrange and fold into specific shapes. Developmental biology identified patterns of gene expression and cytoskeletal regulation underlying local tissue dynamics, but how actions of multiple domains of distinct cell types coordinate to remodel tissues at the organ scale remains unclear. We use in toto light-sheet microscopy, automated image analysis, and physical modeling to quantitatively investigate the link between kinetics of global tissue transformations and force generation patterns during Drosophila gastrulation. Read More

Estimates of age-specific natural (M) and fishing (F) mortalities among economically important stocks are required to determine sustainable yields and, ultimately, facilitate effective resource management. Here we used hazard functions to estimate mortality rates for eastern sea garfish, Hyporhamphus australis, a pelagic species that forms the basis of an Australian commercial lampara-net fishery. Data describing annual (2004 to 2015) age frequencies (0-1 to 5-6 years), yield, effort (boat-days), and average weights at age were used to fit various stochastic models to estimate mortality rates by maximum likelihood. Read More

P-values are being computed for increasingly complicated statistics but lacking evaluations on their quality. Meanwhile, accurate p-values enable significance comparison across batches of hypothesis tests and consequently unified false discover rate (FDR) control. This article discusses two related questions in this setting. Read More

Viewing the trajectory of a patient as a dynamical system, a recurrent neural network was developed to learn the course of patient encounters in the Pediatric Intensive Care Unit (PICU) of a major tertiary care center. Data extracted from Electronic Medical Records (EMR) of about 12000 patients who were admitted to the PICU over a period of more than 10 years were leveraged. The RNN model ingests a sequence of measurements which include physiologic observations, laboratory results, administered drugs and interventions, and generates temporally dynamic predictions for in-ICU mortality at user-specified times. Read More

One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet, the distance between any pair of one-hot vectors is equidistant. Read More

Although redistribution of red blood cells at bifurcated vessels is highly dependent on flow rate, it is still challenging to quantitatively express the dependency of flow rate in plasma skimming due to nonlinear cellular interactions. We suggest a plasma skimming model that can involve the effect of fractional blood flow at each bifurcation point. For validating the new model, it is compared with \textit{in vivo} data at single bifurcation points, as well as microvascular network systems. Read More