Quantitative Biology - Quantitative Methods Publications (50)


Quantitative Biology - Quantitative Methods Publications

Ammonium assimilation in E. coli is regulated by two paralogous proteins (GlnB and GlnK), which orchestrate interactions with regulators of gene expression, transport proteins and metabolic pathways. Yet how they conjointly modulate the activity of glutamine synthetase (GS), the key enzyme for nitrogen assimilation, is poorly understood. Read More

This paper analyses the dynamics of infectious disease with a concurrent spread of disease awareness. The model includes local awareness due to contacts with aware individuals, as well as global awareness due to reported cases of infection and awareness campaigns. We investigate the effects of time delay in response of unaware individuals to available information on the epidemic dynamics by establishing conditions for the Hopf bifurcation of the endemic steady state of the model. Read More

A manual measurement of blood vessels diameter is a conventional component of routine visual assessment of microcirculation, say, during optical capillaroscopy. However, many modern optical methods for blood flow measurements demand the reliable procedure for a fully automated detection of vessels and estimation of their diameter that is a challenging task. Specifically, if one measure the velocity of red blood cells by means of laser speckle imaging, then visual measurements become impossible, while the velocity-based estimation has their own limitations. Read More

Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Read More

The aim of this paper was to develop statistical models to estimate individual breed composition based on the previously proposed idea of regressing discrete random variables corresponding to counts of reference alleles of biallelic molecular markers located across the genome on the allele frequencies of each marker in the pure (base) breeds. Some of the existing regression-based methods do not guarantee that estimators of breed composition will lie in the appropriate parameter space and none of them accounts for uncertainty about allele frequencies in the pure breeds, that is, uncertainty about the design matrix. In order to overcome these limitations, we proposed two Bayesian generalized linear models. Read More

Median lethal death, LD50, is a general indicator of compound acute oral toxicity (AOT). Various in silico methods were developed for AOT prediction to reduce costs and time. In this study, a deep learning architecture composed of multi-layer convolution neural network was used to develop three types of high-level predictive models: regression model (deepAOT-R), multi-classification (deepAOT-C) model and multitask model (deepAOT-CR) for AOT evaluation. Read More

Intra-tumour phenotypic heterogeneity limits accuracy of clinical diagnostics and hampers the efficiency of anti-cancer therapies. Dealing with this cellular heterogeneity requires adequate understanding of its sources, which is extremely difficult, as phenotypes of tumour cells integrate hardwired (epi)mutational differences with the dynamic responses to microenvironmental cues. The later come in form of both direct physical interactions, as well as inputs from gradients of secreted signalling molecules. Read More

Here we report a method for visualization of volumetric structural information of live biological samples with no exogenous contrast agents. The process is made possible through a technique that involves generation, synthesis and analysis of three-dimensional (3D) Fourier components of light diffracted by the sample. This leads to the direct recovery of quantitative cellular morphology with no iterative procedures for reduced computational complexity. Read More

We investigate the rates of drug resistance acquisition in a natural population using molecular epidemiological data from Bolivia. First, we study the rate of direct acquisition of double resistance from the double sensitive state within patients and compare it to the rates of evolution to single resistance. In particular, we address whether or not double resistance can evolve directly from a double sensitive state within a given host. Read More

Predicting the biological function of molecules, be it proteins or drug-like compounds, from their atomic structure is an important and long-standing problem. Function is dictated by structure, since it is by spatial interactions that molecules interact with each other, both in terms of steric complementarity, as well as intermolecular forces. Thus, the electron density field and electrostatic potential field of a molecule contain the "raw fingerprint" of how this molecule can fit to binding partners. Read More

Objective: Due to the non-linearity of numerous biomedical signals, non-linear analysis of multi-channel time series, notably multivariate multiscale entropy (mvMSE), has been extensively used in biomedical signal processing. However, mvMSE has three drawbacks: 1) mvMSE values are either undefined or unreliable for short signals; 2) mvMSE is not fast enough for real-time applications; and 3) the computation of mvMSE for signals with a large number of channels requires the storage of a huge number of elements. Methods: To deal with these problems and improve the stability of mvMSE, we introduce multivariate multiscale dispersion entropy (MDE - mvMDE) as an extension of our recently developed MDE, to quantify the complexity of multivariate time series. Read More

Identifying disease genes from human genome is an important and fundamental problem in biomedical research. Despite many publications of machine learning methods applied to discover new disease genes, it still remains a challenge because of the pleiotropy of genes, the limited number of confirmed disease genes among whole genome and the genetic heterogeneity of diseases. Recent approaches have applied the concept of 'guilty by association' to investigate the association between a disease phenotype and its causative genes, which means that candidate genes with similar characteristics as known disease genes are more likely to be associated with diseases. Read More

Particle tracking is a powerful biophysical tool that requires conversion of large video files into position time series, i.e. traces of the species of interest for data analysis. Read More

The biophysical analysis of dynamically formed multi-protein complexes in solution presents a formidable technical challenge. Sedimentation velocity (SV) analytical ultracentrifugation achieves strongly size-dependent hydrodynamic resolution of different size species, and can be combined with multi-component detection by exploiting different spectral properties or temporally modulated signals from photoswitchable proteins. Coexisting complexes arising from self- or hetero-associations that can be distinguished in SV allow measurement of their stoichiometry, affinity, and cooperativity. Read More

MicroRNA (miRNA) are small non-coding RNAs that regulates the gene expression at the post-transcriptional level. Determining whether a sequence segment is miRNA is experimentally challenging. Also, experimental results are sensitive to the experimental environment. Read More

The medical research facilitates to acquire a diverse type of data from the same individual for particular cancer. Recent studies show that utilizing such diverse data results in more accurate predictions. The major challenge faced is how to utilize such diverse data sets in an effective way. Read More

A method is proposed to generate an optimal fit of a number of connected linear trend segments onto time-series data. To be able to efficiently handle many lines, the method employs a stochastic search procedure to determine optimal transition point locations. Traditional methods use exhaustive grid searches, which severely limit the scale of the problems for which they can be utilized. Read More

In this paper, we describe the numerical reconstruction method for quantitative photoacoustic tomography (QPAT) based on the radiative transfer equation (RTE), which models light propagation more accurately than diffusion approximation (DA). We investigate the reconstruction of absorption coefficient and/or scattering coefficient of biological tissues. Given the scattering coefficient, an improved fixed-point iterative method is proposed to retrieve the absorption coefficient for its cheap computational cost. Read More

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal components analysis (PCA). However, the application of PCA is not straightforward for multi-source data, wherein multiple sources of 'omics data measure different but related biological components. In this article we utilize recent advances in the dimension reduction of multi-source data for predictive modeling. Read More

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Read More

High throughput sequencing is a technology that allows for the generation of millions of reads of genomic data regarding a study of interest, and data from high throughput sequencing platforms are usually count compositions. Subsequent analysis of such data can yield information on tran- scription profiles, microbial diversity, or even relative cellular abundance in culture. Because of the high cost of acquisition, the data are usually sparse, and always contain far fewer observations than variables. Read More

Atomic Force Microscopy - Infrared (AFM-IR) spectroscopy allows spectroscopic studies in the mid-infrared spectral region with a spatial resolution better than 50 nm. We show that the high spatial resolution can be used to perform spectroscopic and imaging studies at the subcellular level in fixed eukaryotic cells. We collect AFM-IR images of subcellular structures that include lipid droplets, vesicles and cytoskeletal filaments, by relying on the intrinsic contrast from IR light absorption. Read More

This contribution reports an application of MultiFractal Detrended Fluctuation Analysis, MFDFA based novel feature extraction technique for automated detection of epilepsy. In fractal geometry, Multifractal Detrended Fluctuation Analysis MFDFA is a popular technique to examine the self-similarity of a nonlinear, chaotic and noisy time series. In the present research work, EEG signals representing healthy, interictal (seizure free) and ictal activities (seizure) are acquired from an existing available database. Read More

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large sequence analyses. Read More

Miniscope calcium imaging is increasingly being used to monitor large populations of neuronal activities in freely behaving animals. However, due to the high background and low signal-to-noise ratio of the single-photon based imaging used in this technique, extraction of neural signals from the large numbers of imaged cells automatically has remained challenging. Here we describe a highly accurate framework for automatically identifying activated neurons and extracting calcium signals from the miniscope imaging data, seeds cleansing Constrained Nonnegative Matrix Factorization (sc-CNMF). Read More

As described in the work of Mietke et al. (1) the deformation (defined as 1 - circularity [see (2)]) of a purely elastic, spherical object deformed in a real-time deformability cytometry (RT-DC) experiment can be mapped to its apparent Young's Modulus. This note is supposed to help a fast and correct mapping of RT-DC results - namely, deformation and size - to values of the apparent Young's Modulus E. Read More

Motivation: Spatial pattern formation of the primary anterior-posterior morphogenetic gradient of the transcription factor Bicoid (Bcd) has been studied experimentally and computationally for many years. Bcd specifies positional information for the downstream segmentation genes, affecting the fly body plan. More recently, a number of researchers have focused on the patterning dynamics of the underlying bcd mRNA gradient, which is translated into Bcd protein. Read More

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e. Read More

Epistasis, or the context-dependence of the effects of mutations, limits our ability to predict the functional impact of combinations of mutations, and ultimately our ability to predict evolutionary trajectories. Information about the context-dependence of mutations can essentially be obtained in two ways: First, by experimental measurement the functional effects of combinations of mutations and calculating the epistatic contributions directly, and second, by statistical analysis of the frequencies and co-occurrences of protein residues in a multiple sequence alignment of protein homologs. In this manuscript, we derive the mathematical relationship between epistasis calculated on the basis of functional measurements, and the covariance calculated from a multiple sequence alignment. Read More

Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, gene expression, etc. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Read More

Motivation: Site directed mutagenesis is widely used to understand the structure and function of biomolecules. Computational prediction of protein mutation impacts offers a fast, economical and potentially accurate alternative to laboratory mutagenesis. Most existing methods rely on geometric descriptions, this work introduces a topology based approach to provide an entirely new representation of protein mutation impacts that could not be obtained from conventional techniques. Read More

Toxicity analysis and prediction are of paramount importance to human health and environmental protection. Existing computational methods are built from a wide variety of descriptors and regressors, which makes their performance analysis difficult. For example, deep neural network (DNN), a successful approach in many occasions, acts like a black box and offers little conceptual elegance or physical understanding. Read More

We present a feature functional theory - binding predictor (FFT-BP) for the protein-ligand binding affinity prediction. The underpinning assumptions of FFT-BP are as follows: i) representability: there exists a microscopic feature vector that can uniquely characterize and distinguish one protein-ligand complex from another; ii) feature-function relationship: the macroscopic features, including binding free energy, of a complex is a functional of microscopic feature vectors; and iii) similarity: molecules with similar microscopic features have similar macroscopic features, such as binding affinity. Physical models, such as implicit solvent models and quantum theory, are utilized to extract microscopic features, while machine learning algorithms are employed to rank the similarity among protein-ligand complexes. Read More

Motion detection and position tracking of animal behavior over a period of time produce massive amount of information, but analysis and interpretation of such huge datasets are challenging. Here we describe statistical methods to extract major movement structures of Drosophila locomotion in a circular arena, and examine the effects of pulsed light stimulation on these identified locomotor structures. Drosophila adults performed exploratory behavior when restrained individually in the circular arenas (1. Read More

Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Read More

Causal ordering of key events in the cell cycle is essential for proper functioning of an organism. Yet, it remains a mystery how a specific temporal program of events is maintained despite ineluctable stochasticity in the biochemical dynamics which dictate timing of cellular events. We propose that if a change of cell fate is triggered by the {\em time-integral} of the underlying stochastic biochemical signal, rather than the original signal, then a dramatic improvement in temporal specificity results. Read More

This work presents a study on the extraction and analysis of a set of 101 categories of eye movement features from three types of eye movement events: fixations, saccades, and post-saccadic oscillations. The eye movements were recorded during a reading task. For the categories of features with multiple instances in a recording we extract corresponding feature subtypes by calculating descriptive statistics on the distributions of these instances. Read More

Gene tree/species tree reconciliation is a recent decisive progress in phylo-genetic methods, accounting for the possible differences between gene histories and species histories. Reconciliation consists in explaining these differences by gene-scale events such as duplication, loss, transfer, which translates mathematically into a mapping between gene tree nodes and species tree nodes or branches. Gene conversion is a very frequent biological event, which results in the replacement of a gene by a copy of another from the same species and in the same gene tree. Read More

We introduce a new dominance concept consisting of three new dominance metrics based on Lloyd's (1967) mean crowding index. The new metrics link communities and species, whereas existing ones are applicable only to communities. Our community-level metric is a function of Simpson's diversity index. Read More

Dextran sulfate is semi-synthetic, polydisperse sulfated polysaccharide with important applications in clinical practice, in the manufacturing of plasma derived protein therapeutics and in biomedical research. The sensitive detection of dextran sulfate is relevant to preclinical and clinical drug development projects, the quality control of pharmaceutic formulations, and the process control in plasma fractionation using dextran sulfate modified chromatographic columns. Most analytical methods for the sensitive detection of dextran sulfate require multistep protcols and have not been transferred into commercial formats. Read More

Advances in molecular biology are enabling rapid and efficient analyses for effective intervention in domains such as biology research, infectious disease management, food safety, and biodefense. The emergence of microfluidics and nanotechnologies has enabled both new capabilities and instrument sizes practical for point-of-care. It has also introduced new functionality, enhanced sensitivity, and reduced the time and cost involved in conventional molecular diagnostic techniques. Read More

The concept of dynamical compensation has been recently introduced to describe the ability of a biological system to keep its output dynamics unchanged in the face of varying parameters. Here we show that, according to its original definition, dynamical compensation is equivalent to lack of structural identifiability. This is relevant if model parameters need to be estimated, which is often the case in biological modelling. Read More

Most human protein-coding genes can be transcribed into multiple possible distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and exist in tissue- and sample-specific frequencies. Read More

The microtubule (MT) motor Kip3p is very processive kinesin that promotes catastrophes and pausing in particular on cortical contact. These properties explain the role of Kip3p in positioning the mitotic spindle in budding yeast and potentially other processes controlled by kinesin-8 family members. We present a theoretical approach to positioning of a MT network in a cell. Read More

Motivation: Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such normalization of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways. Results: First, we study how the outcome of hypothesis tests for differential metabolite concentration is affected by the choice of scale. Read More

Camera-traps is a relatively new but already popular instrument in the estimation of abundance of non-identifiable animals. Although camera-traps are convenient in application, there remain both theoretical complications such as spatial autocorrelation or false negative problem and practical difficulties, for example, laborious random sampling. In the article we propose an alternative way to bypass the mentioned problems. Read More

Motivation: We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization. Read More

This paper introduces a nonparametric copula-based approach for detecting the strength and monotonicity of linear and nonlinear statistical dependence between bivariate continuous, discrete or hybrid random variables and stochastic signals, termed CIM. We show that CIM satisfies the data processing inequality and is consequently a self-equitable metric. Simulation results using synthetic datasets reveal that the CIM compares favorably to other state-of-the-art statistical dependency metrics, including the Maximal Information Coefficient (MIC), Randomized Dependency Coefficient (RDC), distance Correlation (dCor), Copula correlation (Ccor), and Copula Statistic (CoS) in both statistical power and sample size requirements. Read More

Understanding the relationship between spontaneous stochastic fluctuations and the topology of the underlying gene regulatory network is fundamental for the study of gene expression, especially at the molecular level. Here by solving the analytical steady-state distribution of the protein copy number in a general kinetic model of gene expression, we reveal a quantitative relation between stochastic fluctuations and feedback regulation at the single-molecule level, which provides novel insights into how and to what extent a feedback loop can enhance or suppress molecular fluctuations. Based on such relation, we also develop an effective method to extract the topological information of gene regulatory networks from single-cell gene expression data. Read More