# Ji Liu - CSL

## Contact Details

NameJi Liu |
||

AffiliationCSL |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesMathematics - Optimization and Control (14) Computer Science - Learning (12) Mathematics - Combinatorics (10) Mathematics - Number Theory (10) Statistics - Machine Learning (9) Computer Science - Distributed; Parallel; and Cluster Computing (4) Solar and Stellar Astrophysics (3) High Energy Astrophysical Phenomena (3) Mathematics - Dynamical Systems (3) Computer Science - Multiagent Systems (2) Astrophysics of Galaxies (1) Quantum Physics (1) Computer Science - Computer Science and Game Theory (1) Physics - Atomic and Molecular Clusters (1) Physics - Statistical Mechanics (1) Physics - Atomic Physics (1) Computer Science - Numerical Analysis (1) Physics - Chemical Physics (1) Mathematics - Information Theory (1) Physics - Physics and Society (1) Computer Science - Artificial Intelligence (1) Computer Science - Computer Vision and Pattern Recognition (1) Computer Science - Information Theory (1) Statistics - Theory (1) Mathematics - Statistics (1) Quantitative Biology - Molecular Networks (1) |

## Publications Authored By Ji Liu

Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. Read More

The state-of-the-art online learning approaches is only capable of learning the metric for predefined tasks. In this paper, we consider lifelong learning problem to mimic "human learning", i.e. Read More

This paper considers the multi-armed thresholding bandit problem -- identifying all arms above a predefined threshold via as few pulls (or rounds) as possible -- proposed by Locatelli et al. [2016] recently. Although the proposed algorithm in Locatelli et al. Read More

The cardinality constraint is an intrinsic way to restrict the solution structure in many domains, for example, sparse learning, feature selection, and compressed sensing. To solve a cardinality constrained problem, the key challenge is to solve the projection onto the cardinality constraint set, which is NP-hard in general when there exist multiple overlapped cardiaiality constraints. In this paper, we consider the scenario where overlapped cardinality constraints satisfy a Three-view Cardinality Structure (TVCS), which reflects the natural restriction in many applications, such as identification of gene regulatory networks and task-worker assignment problem. Read More

This paper studies a recently proposed continuous-time distributed self-appraisal model with time-varying interactions among a network of $n$ individuals which are characterized by a sequence of time-varying relative interaction matrices. The model describes the evolution of the social-confidence levels of the individuals via a reflected appraisal mechanism in real time. We first show by example that when the relative interaction matrices are stochastic (not doubly stochastic), the social-confidence levels of the individuals may not converge to a steady state. Read More

This paper analyses the DeGroot-Friedkin model for evolution of the individuals' social powers in a social network when the network topology varies dynamically (described by dynamic relative interaction matrices). The DeGroot-Friedkin model describes how individual social power (self-appraisal, self-weight) evolves as a network of individuals discuss a sequence of issues. We seek to study dynamically changing relative interactions because interactions may change depending on the issue being discussed. Read More

According to the DeGroot-Friedkin model of a social network, an individual's social power evolves as the network discusses individual opinions over a sequence of issues. Under mild assumptions on the connectivity of the network, the social power of every individual converges to a constant strictly positive value as the number of issues discussed increases. If the network has a special topology, termed "star topology", then all social power accumulates with the individual at the centre of the star. Read More

Epidemic processes are used commonly for modeling and analysis of biological networks, computer networks, and human contact networks. The idea of competing viruses has been explored recently, motivated by the spread of different ideas along different social networks. Previous studies of competitive viruses have focused only on two viruses and on static graph structures. Read More

Identifying significant location categories visited by mobile users is the key to a variety of applications. This is an extremely challenging task due to the possible deviation between the estimated location coordinate and the actual location, which could be on the order of kilometers. To estimate the actual location category more precisely, we propose a novel tensor factorization framework, through several key observations including the intrinsic correlations between users, to infer the most likely location categories within the location uncertainty circle. Read More

We establish four supercongruences between truncated ${}_3F_2$ hypergeometric series involving $p$-adic Gamma functions, which extend some of the Rodriguez-Villegas supercongruences. Read More

In a recent paper, a distributed algorithm was proposed for solving linear algebraic equations of the form $Ax = b$ assuming that the equation has at least one solution. The equation is presumed to be solved by $m$ agents assuming that each agent knows a subset of the rows of the matrix $[A \; b]$, the current estimates of the equation's solution generated by each of its neighbors, and nothing more. Neighbor relationships are represented by a time-dependent directed graph $N(t)$ whose vertices correspond to agents and whose arcs characterize neighbor relationships. Read More

By the distributed averaging problem is meant the problem of computing the average value of a set of numbers possessed by the agents in a distributed network using only communication between neighboring agents. Gossiping is a well-known approach to the problem which seeks to iteratively arrive at a solution by allowing each agent to interchange information with at most one neighbor at each iterative step. Crafting a gossiping protocol which accomplishes this is challenging because gossiping is an inherently collaborative process which can lead to deadlocks unless careful precautions are taken to ensure that it does not. Read More

Using an identity due to Gessel and Stanton and some properties of the $p$-adic Gamma function, we establish a $p$-adic supercongruence for truncated hypergeometric series ${}_7F_6$. Some related supercongruences are also deduced from it. Read More

Let $p\ge 5$ be a prime and $\langle a\rangle_p$ denote the least non-negative integer $r$ with $a\equiv r \pmod{p}$. We mainly prove that for any $p$-adic integer $a$ with $\langle a\rangle_p\equiv 0 \pmod{2}$ \begin{align*} \sum_{k=0}^{p-1}\frac{\left(\frac{1}{2}\right)_k(-a)_k(a+1)_k}{(1)_k^3}\equiv (-1)^{\frac{p+1}{2}}\Gamma_p\left(-\frac{a}{2}\right)^2\Gamma_p\left(\frac{a+1}{2}\right)^2 \pmod{p^2}, \end{align*} where $(x)_k=x(x+1)\cdots (x+k-1)$ and $\Gamma_p(\cdot)$ denotes the $p$-adic Gamma function. This partially extends the four Rodriguez-Villegas supercongruences for truncated hypergeometric series ${}_3F_2$. Read More

**Authors:**Wei Zhang, Minwei Feng, Yunhui Zheng, Yufei Ren, Yandong Wang, Ji Liu, Peng Liu, Bing Xiang, Li Zhang, Bowen Zhou

Deep learning (DL) training-as-a-service (TaaS) is an important emerging industrial workload. The unique challenge of TaaS is that it must satisfy a wide range of customers who have no experience and resources to tune DL hyper-parameters, and meticulous tuning for each user's dataset is prohibitively expensive. Therefore, TaaS hyper-parameters must be fixed with values that are applicable to all users. Read More

Recently there has been significant interest in training machine-learning models at low precision: by reducing precision, one can reduce computation and communication by one order of magnitude. We examine training at reduced precision, both from a theoretical and practical perspective, and ask: is it possible to train models at end-to-end low precision with provable guarantees? Can this lead to consistent order-of-magnitude speedups? We present a framework called ZipML to answer these questions. For linear models, the answer is yes. Read More

Surgical Site Infection (SSI) is a national priority in healthcare research. Much research attention has been attracted to develop better SSI risk prediction models. However, most of the existing SSI risk prediction models are built on static risk factors such as comorbidities and operative factors. Read More

In this paper, we consider linear state-space models with compressible innovations and convergent transition matrices in order to model spatiotemporally sparse transient events. We perform parameter and state estimation using a dynamic compressed sensing framework and develop an efficient solution consisting of two nested Expectation-Maximization (EM) algorithms. Under suitable sparsity assumptions on the innovations, we prove recovery guarantees and derive confidence bounds for the state estimates. Read More

Magnetic cataclysmic variables (CVs) contain a white dwarf with magnetic field strong enough to control the accretion flow from a late type secondary. In this paper, we discover a magnetic CV (CXOGSG J215544.4+380116) from the $Chandra$ archive data. Read More

The stochastic composition optimization proposed recently by Wang et al. [2014] minimizes the objective with the compositional expectation form: $\min_x~(\mathbb{E}_iF_i \circ \mathbb{E}_j G_j)(x).$ It summarizes many important applications in machine learning, statistics, and finance. Read More

Rodriguez-Villegas conjectured 22 supercongruences for hypergeometric Calabi-Yau manifolds of dimension $D\le 3$. For manifolds of dimension $D=2$, he posed four conjectural supercongruences on truncated generalized hypergeometric series ${}_3F_2$. These four conjectures were gradually confirmed by several authors. Read More

Motivation: Gene regulatory interactions are of fundamental importance to various biological functions and processes. However, only a few previous computational studies have claimed success in revealing genome-wide regulatory landscapes from temporal gene expression data, especially for complex eukaryotes like human. Moreover, recent work suggests that these methods still suffer from the curse of dimensionality if network size increases to 100 or higher. Read More

We develop a new statistical machine learning paradigm, named infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set, which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels. The infinite-label learning fundamentally expands the scope of conventional multi-label learning, and better models the practical requirements in various real-world applications, such as image tagging, ads-query association, and article categorization. However, how can we learn a labeling function that is capable of assigning to a data point the labels omitted from the training set? To answer the question, we seek some clues from the recent work on zero-shot learning, where the key is to represent a class/label by a vector of semantic codes, as opposed to treating them as atomic labels. Read More

Open physical systems with balanced loss and gain exhibit a transition, absent in their solitary counterparts, which engenders modes that exponentially decay or grow with time and thus spontaneously breaks the parity-time PT symmetry. This PT-symmetry breaking is induced by modulating the strength or the temporal profile of the loss and gain, but also occurs in a pure dissipative system without gain. It has been observed that, in classical systems with mechanical, electrical, and electromagnetic setups with static loss and gain, the PT-symmetry breaking transition leads to extraordinary behavior and functionalities. Read More

Consider the stochastic composition optimization problem where the objective is a composition of two expected-value functions. We propose a new stochastic first-order method, namely the accelerated stochastic compositional proximal gradient (ASC-PG) method, which updates based on queries to the sampling oracle using two different timescales. The ASC-PG is the first proximal gradient method for the stochastic composition problem that can deal with nonsmooth regularization penalty. Read More

Let the numbers $\alpha_n,\beta_n$ and $\gamma_n$ denote \begin{align*} \alpha_n=\sum_{k=0}^{n-1}{2k\choose k},\quad \beta_n=\sum_{k=0}^{n-1}{2k\choose k}\frac{1}{k+1}\quad\text{and}\quad \gamma_n=\sum_{k=0}^{n-1}{2k\choose k}\frac{3k+2}{k+1}, \end{align*} respectively. We prove that for any prime $p\ge 5$ and positive integer $n$ \begin{align*} \alpha_{np}&\equiv \left(\frac{p}{3}\right) \alpha_n \pmod{p^2},\\ \beta_{np}&\equiv \begin{cases} \displaystyle \beta_n \pmod{p^2},\quad &\text{if $p\equiv 1\pmod{3}$},\\ -\gamma_n \pmod{p^2},\quad &\text{if $p\equiv 2\pmod{3}$}, \end{cases} \end{align*} where $\left(\frac{\cdot}{p}\right)$ denotes the Legendre symbol. These two supercongruences were recently conjectured by Apagodu and Zeilberger. Read More

Asynchronous parallel optimization received substantial successes and extensive attention recently. One of core theoretical questions is how much speedup (or benefit) the asynchronous parallelization can bring us. This paper provides a comprehensive and generic analysis to study the speedup property for a broad range of asynchronous parallel stochastic algorithms from the zeroth order to the first order methods. Read More

Rodriguez-Villegas conjectured four supercongruences associated to certain elliptic curves, which were first confirmed by Mortenson by using the Gross-Koblitz formula. In this paper, we aim to prove four supercongruences between two truncated hypergeometric series ${}_2F_1$. This generalizes these four Rodriguez-Villegas supercongruences. Read More

Motivated by the spread of opinions on different social networks, we study a distributed continuous-time bi-virus model for a system of groups of individuals. An in-depth stability analysis is performed for more general models than have been previously considered, for the healthy and epidemic states. In addition, we investigate sensitivity properties of some nontrivial equilibria and obtain an impossibility result for distributed feedback control. Read More

We present the detection of Cepheids in the barred spiral galaxy NGC 1313, using the Wide Field and Planetary Camera 2 on the Hubble Space Telescpoe. Twenty B (F450W) and V (F555W) epochs of observations spanning over three weeks were obtained, on which the profile-fitting photometry of all stars in the monitored field was performed using the package HSTphot. A sample of 26 variable stars have been identified to be Cepheids, with periods between 3 and 14 days. Read More

Based on Kepler data, we present the results of a search for white-light flares on 1049 close binaries. We identify 234 flare binaries, on which 6818 flares are detected. We compare the flare-binary fraction in different binary morphologies ("detachedness"). Read More

By using the Rodriguez-Villegas-Mortenson supercongruences, we prove four supercongruences on sums involving binomial coefficients, which were originally conjectured by Sun. We also confirm a related conjecture of Guo on integer-valued polynomials. Read More

We propose a geometrically thick, super-Eddington accretion disk model, where an optically thick wind is not necessary, to understand ultraluminous supersoft sources (ULSs). For high mass accretion rates $\dot M \ga 30\dot M_{\rm Edd}$ and not small inclination angles $\theta \ga 25^{\circ}$, where $\dot M_{\rm Edd}$ is the Eddington accretion rate, the hard photons from the hot inner region may be shaded by the geometrically thick inner disk, and therefore only the soft photons from the outer thin disk and the outer photosphere of the thick disk can reach the observer. Our model can naturally explain the approximate relation between the typical thermal radius and the thermal temperature, $R_{\rm bb} \propto T_{\rm bb}^{-2}$. Read More

The Delannoy numbers and Schr\"oder numbers are given by \begin{align*} D_n=\sum_{k=0}^n{n\choose k}{n+k\choose k}\quad \text{and}\quad S_n=\sum_{k=0}^n{n\choose k}{n+k\choose k}\frac{1}{k+1}, \end{align*} respectively. Let $p>3$ be a prime. We mainly prove that \begin{align*} \sum_{k=1}^{p-1}D_k S_k\equiv 2p^3B_{p-3}-2pH^{*}_{p-1} \pmod{p^4}, \end{align*} where $B_n$ is the $n$-th Bernoulli number and those $H^{*}_n$ are the alternating harmonic numbers given by $H^{*}_n=\sum_{k=1}^{n}\frac{(-1)^k}{k}$. Read More

This paper considers the discrete-time version of Altafini's model for opinion dynamics in which the interaction among a group of agents is described by a time-varying signed digraph. Prompted by an idea from [1], exponential convergence of the system is studied using a graphical approach. Necessary and sufficient conditions for exponential convergence with respect to each possible type of limit states are provided. Read More

Teaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. Read More

We demonstrate a novel technique for cooling a degenerate Fermi gas in a crossed-beam optical dipole trap, where high-energy atoms can be selectively removed from the trap by modulating the stiffness of the trapping potential with anharmonic trapping frequencies. We measure the dependence of the cooling effect on the frequency and amplitude of the parametric modulations. It is found that the large anharmonicity along the axial trapping potential allows to generate a degenerate Fermi gas with anisotropic energy distribution, in which the cloud energy in the axial direction can be reduced to the ground state value. Read More

The formation of relativistic jets by an accreting compact object is one of the fundamental mysteries of astrophysics. While the theory is poorly understood, observations of relativistic jets from systems known as microquasars have led to a well-established phenomenology. Relativistic jets are not expected from sources with soft or supersoft X-ray spectra, although two such systems are known to produce relatively low-velocity bipolar outflows. Read More

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD (ASGD) has been widely adopted for accelerating the training of large-scale deep networks in a distributed computing environment. However, in practice it is quite challenging to tune the training hyperparameters (such as learning rate) when using ASGD so as achieve convergence and linear speedup, since the stability of the optimization algorithm is strongly influenced by the asynchronous nature of parameter updates. Read More

Many practical applications such as gene expression analysis, multi-task learning, image recognition, signal processing, and medical data analysis pursue a sparse solution for the feature selection purpose and particularly favor the nonzeros \emph{evenly} distributed in different groups. The exclusive sparsity norm has been widely used to serve to this purpose. However, it still lacks systematical studies for exclusive sparsity norm optimization. Read More

In the set of stochastic, indecomposable, aperiodic (SIA) matrices, the class of stochastic Sarymsakov matrices is the largest known subset (i) that is closed under matrix multiplication and (ii) the infinitely long left-product of the elements from a compact subset converges to a rank-one matrix. In this paper, we show that a larger subset with these two properties can be derived by generalizing the standard definition for Sarymsakov matrices. The generalization is achieved either by introducing an "SIA index", whose value is one for Sarymsakov matrices, and then looking at those stochastic matrices with larger SIA indices, or by considering matrices that are not even SIA. Read More

This paper studies a multi-period demand response management problem in the smart grid where multiple utility companies compete among themselves. The user-utility interactions are modeled by a noncooperative game of a Stackelberg type where the interactions among the utility companies are captured through a Nash equilibrium. It is shown that this game has a unique Stackelberg equilibrium at which the utility companies set prices to maximize their revenues (within a Nash game) while the users respond accordingly to maximize their utilities subject to their budget constraints. Read More

We study theoretically the quantum dynamics of nitrogen molecules (N$_2$) exposed to intense and ultrafast x-rays at a wavelength of 1.1 nm (1100 eV photon energy) from the Linac Coherent Light Source (LCLS) free electron laser. Molecular rate equations are derived to describe the intertwined photoionization, decay, and dissociation processes occurring for N$_2$. Read More

We show that, along the critical isobar, the complete scaling results in a unique leading scaling qualitatively distinct to that arising from the simple and the revised scalings. This is verified by a complete-field finite-time scaling theory, which combines the complete scaling with finite-time scaling, and its application to the molecular dynamics simulations of the vapor-liquid critical point of a three-dimensional one-component Lennard-Jones fluid in an isobaric-isothermal ensemble with linear heating or cooling. Both the static and the dynamic critical exponents as well as the critical parameters can be estimated without \emph{a priori} knowledge of the universality class. Read More

Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism. To fill the gaps in theory and provide theoretical supports, this paper studies two asynchronous parallel implementations of SG: one is on the computer network and the other is on the shared memory system. Read More

We prove that, if $m,n\geqslant 1$ and $a_1,\ldots,a_m$ are nonnegative integers, then \begin{align*} \frac{[a_1+\cdots+a_m+1]!}{[a_1]!\ldots[a_m]!}\sum^{n-1}_{h=0}q^h\prod_{i=1}^m{h\brack a_i} \equiv 0\pmod{[n]}, \end{align*} where $[n]=\frac{1-q^n}{1-q}$, $[n]!=[n][n-1]\cdots[1]$, and ${a\brack b}=\prod_{k=1}^b\frac{1-q^{a-k+1}}{1-q^k}$. The $a_1=\cdots=a_m$ case confirms a recent conjecture of Z.-W. Read More

We consider in this paper a networked system of opinion dynamics in continuous time, where the agents are able to evaluate their self-appraisals in a distributed way. In the model we formulate, the underlying network topology is described by a rooted digraph. For each ordered pair of agents $(i,j)$, we assign a function of self-appraisal to agent $i$, which measures the level of importance of agent $i$ to agent $j$. Read More

This paper studies the opinion dynamics that result when individuals consecutively discuss a sequence of issues. Specifically, we study how individuals' self-confidence levels evolve via a reflected appraisal mechanism. Motivated by the DeGroot-Friedkin model, we propose a Modified DeGroot-Friedkin model which allows individuals to update their self-confidence levels by only interacting with their neighbors and in particular, the modified model allows the update of self-confidence levels to take place in finite time without waiting for the opinion process to reach a consensus on any particular issue. Read More

A distributed algorithm is described for solving a linear algebraic equation of the form $Ax=b$ assuming the equation has at least one solution. The equation is simultaneously solved by $m$ agents assuming each agent knows only a subset of the rows of the partitioned matrix $(A,b)$, the current estimates of the equation's solution generated by its neighbors, and nothing more. Each agent recursively updates its estimate by utilizing the current estimates generated by each of its neighbors. Read More

The numbers $R_n$ and $W_n$ are defined as \begin{align*} R_n=\sum_{k=0}^{n}{n+k\choose 2k}{2k\choose k}\frac{1}{2k-1},\ \text{and}\ W_n=\sum_{k=0}^{n}{n+k\choose 2k}{2k\choose k}\frac{3}{2k-3}. \end{align*} We prove that, for any positive integer $n$ and odd prime $p$, there hold \begin{align*} \sum_{k=0}^{n-1}(2k+1)R_k^2 &\equiv 0 \pmod{n}, \\ \sum_{k=0}^{p-1}(2k+1)R_k^2 &\equiv 4p(-1)^{\frac{p-1}{2}} -p^2 \pmod{p^3}, \\ 9\sum_{k=0}^{n-1}(2k+1)W_k^2 &\equiv 0 \pmod{n}, \\ \sum_{k=0}^{p-1}(2k+1)W_k^2 &\equiv 12p(-1)^{\frac{p-1}{2}}-17p^2 \pmod{p^3}, \quad\text{if $p>3$.} \end{align*} The first two congruences were originally conjectured by Z. Read More