# Bicheng Ying

## Contact Details

NameBicheng Ying |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesMathematics - Optimization and Control (5) Computer Science - Multiagent Systems (5) Computer Science - Learning (4) Statistics - Machine Learning (4) Mathematics - Information Theory (1) Computer Science - Distributed; Parallel; and Cluster Computing (1) Computer Science - Information Theory (1) |

## Publications Authored By Bicheng Ying

The analysis in Part I revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization when gradient noise is present. These algorithms are used when the risk functions are non-smooth and involve non-differentiable components. They have been long recognized as being slow converging methods. Read More

This work develops a distributed optimization strategy with guaranteed exact convergence for a broad class of left-stochastic combination policies. The resulting exact diffusion strategy is shown in Part II to have a wider stability range and superior convergence performance than the EXTRA strategy. The exact diffusion solution is applicable to non-symmetric left-stochastic combination matrices, while most earlier developments on exact consensus implementations are limited to doubly-stochastic matrices; these latter matrices impose stringent constraints on the network topology. Read More

Part I of this work developed the exact diffusion algorithm to remove the bias that is characteristic of distributed solutions for deterministic optimization problems. The algorithm was shown to be applicable to a larger set of combination policies than earlier approaches in the literature. In particular, the combination matrices are not required to be doubly stochastic or right-stochastic, which impose stringent conditions on the graph topology and communications protocol. Read More

In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential agents ends up shaping the beliefs of non-influential agents. Read More

This work examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gradient vector. The selection of coordinates varies randomly from iteration to iteration and from agent to agent across the network. Read More

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. Read More

The stochastic dual coordinate-ascent (S-DCA) technique is a useful alternative to the traditional stochastic gradient-descent algorithm for solving large-scale optimization problems due to its scalability to large data sets and strong theoretical guarantees. However, the available S-DCA formulation is limited to finite sample sizes and relies on performing multiple passes over the same data. This formulation is not well-suited for online implementations where data keep streaming in. Read More

In this work and the supporting Part II, we examine the performance of stochastic sub-gradient learning strategies under weaker conditions than usually considered in the literature. The new conditions are shown to be automatically satisfied by several important cases of interest including SVM, LASSO, and Total-Variation denoising formulations. In comparison, these problems do not satisfy the traditional assumptions used in prior analyses and, therefore, conclusions derived from these earlier treatments are not directly applicable to these problems. Read More

The paper examines the learning mechanism of adaptive agents over weakly-connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower relationship develops with the performance of some agents being fully determined by the performance of other agents that are outside their domain of influence. Read More