# Hongyang Zhang

## Contact Details

NameHongyang Zhang |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesStatistics - Machine Learning (7) Computer Science - Learning (6) Computer Science - Data Structures and Algorithms (2) Mathematics - Optimization and Control (2) Mathematics - Information Theory (1) Computer Science - Numerical Analysis (1) Mathematics - Numerical Analysis (1) Statistics - Theory (1) Computer Science - Artificial Intelligence (1) Mathematics - Statistics (1) Computer Science - Information Theory (1) |

## Publications Authored By Hongyang Zhang

We study the strong duality of non-convex matrix factorization: we show under certain dual conditions, non-convex matrix factorization and its dual have the same optimum. This has been well understood for convex optimization, but little was known for matrix factorization. We formalize the strong duality of matrix factorization through a novel analytical framework, and show that the duality gap is zero for a wide class of matrix factorization problems. Read More

We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of oracles. In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity. Read More

We provide new results concerning noise-tolerant and sample-efficient learning algorithms under $s$-concave distributions over $\mathbb{R}^n$ for $-\frac{1}{2n+3}\le s\le 0$. The new class of $s$-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g. Read More

We study the problem of recovering an incomplete $m\times n$ matrix of rank $r$ with columns arriving online over time. This is known as the problem of life-long matrix completion, and is widely applied to recommendation system, computer vision, system identification, etc. The challenge is to design provable algorithms tolerant to a large amount of noises, with small sample complexity. Read More

We propose and analyze two algorithms for maintaining approximate Personalized PageRank (PPR) vectors on a dynamic graph, where edges are added or deleted. Our algorithms are natural dynamic versions of two known local variations of power iteration. One, Forward Push, propagates probability mass forwards along edges from a source node, while the other, Reverse Push, propagates local changes backwards along edges from a target. Read More

Cellwise outliers are likely to occur together with casewise outliers in modern data sets with relatively large dimension. Recent work has shown that traditional robust regression methods may fail for data sets in this paradigm. The proposed method, called three-step regression, proceeds as follows: first, it uses a consistent univariate filter to detect and eliminate extreme cellwise outliers; second, it applies a robust estimator of multivariate location and scatter to the filtered data to down-weight casewise outliers; third, it computes robust regression coefficients from the estimates obtained in the second step. Read More

Subspace recovery from corrupted and missing data is crucial for various applications in signal processing and information theory. To complete missing values and detect column corruptions, existing robust Matrix Completion (MC) methods mostly concentrate on recovering a low-rank matrix from few corrupted coefficients w.r. Read More

Recovering intrinsic low dimensional subspaces from data distributed on them is a key preprocessing step to many applications. In recent years, there has been a lot of work that models subspace recovery as low rank minimization problems. We find that some representative models, such as Robust Principal Component Analysis (R-PCA), Robust Low Rank Representation (R-LRR), and Robust Latent Low Rank Representation (R-LatLRR), are actually deeply connected. Read More

There has been a surge in the number of large and flat data sets - data sets containing a large number of features and a relatively small number of observations - due to the growing ability to collect and store information in medical research and other fields. Hierarchical clustering is a widely used clustering tool. In hierarchical clustering, large and flat data sets may allow for a better coverage of clustering features (features that help explain the true underlying clusters) but, such data sets usually include a large fraction of noise features (non-clustering features) that may hide the underlying clusters. Read More

Rank minimization has attracted a lot of attention due to its robustness in data recovery. To overcome the computational difficulty, rank is often replaced with nuclear norm. For several rank minimization problems, such a replacement has been theoretically proven to be valid, i. Read More