# Anru Zhang

## Contact Details

NameAnru Zhang |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesMathematics - Statistics (10) Statistics - Methodology (10) Statistics - Theory (10) Statistics - Machine Learning (6) Computer Science - Information Theory (4) Mathematics - Information Theory (4) Computer Science - Learning (2) Mathematics - Probability (1) Statistics - Applications (1) |

## Publications Authored By Anru Zhang

Tensors, or high-order arrays, attract much attention in recent research. In this paper, we propose a general framework for tensor principal component analysis (tensor PCA), which focuses on the methodology and theory for extracting the hidden low-rank structure from the high-dimensional tensor data. A unified solution is provided for tensor PCA with considerations in both statistical limits and computational costs. Read More

The completion of tensors, or high-order arrays, attracts significant attention in recent research. Current literature on tensor completion primarily focuses on recovery from a set of uniformly randomly measured entries, and the required number of measurements to achieve recovery is not guaranteed to be optimal. In addition, the implementation of some previous methods are NP-hard. Read More

We propose a general semi-supervised inference framework focused on the estimation of the population mean. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses ("labels"). Read More

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the sense that the missingness is not dependent on the values of the data. Based on incomplete data, estimators for bandable and sparse covariance matrices are proposed and their theoretical and numerical properties are investigated. Read More

Perturbation bounds for singular spaces, in particular Wedin's $\sin \Theta$ theorem, are a fundamental tool in many fields including high-dimensional statistics, machine learning, and applied mathematics. In this paper, we establish separate perturbation bounds, measured in both spectral and Frobenius $\sin \Theta$ distances, for the left and right singular subspaces. Lower bounds, which show that the individual perturbation bounds are rate-optimal, are also given. Read More

One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper considers regression analysis with such compositional data as covariates. In order to satisfy the subcompositional coherence of the results, linear models with a set of linear constraints on the regression coefficients are introduced. Read More

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Read More

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Read More

Instrumental variables have been widely used for estimating the causal effect between exposure and outcome. Conventional estimation methods require complete knowledge about all the instruments' validity; a valid instrument must not have a direct effect on the outcome and not be related to unmeasured confounders. Often, this is impractical as highlighted by Mendelian randomization studies where genetic markers are used as instruments and complete knowledge about instruments' validity is equivalent to complete knowledge about the involved genes' functions. Read More

Estimation of low-rank matrices is of significant interest in a range of contemporary applications. In this paper, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Read More

This paper considers compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. Read More

How to give an upper bound, especially the smallest upper bound of Gini coefficient based on grouped data in the absence of income brackets is still a problem not properly solved. This article provides an upper bound which is easy to compute, and provides an effective algorithm to calculate the exact value of the smallest upper bound. As illustrations, the calculation results of bounds for Gini coefficients of urban and rural China from 2003 to 2008 will be provided. Read More

This paper establishes new restricted isometry conditions for compressed sensing and affine rank minimization. It is shown for compressed sensing that $\delta_{k}^A+\theta_{k,k}^A < 1$ guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_1$ minimization. Furthermore, the upper bound 1 is sharp in the sense that for any $\epsilon > 0$, the condition $\delta_k^A + \theta_{k, k}^A < 1+\epsilon$ is not sufficient to guarantee such exact recovery using any recovery method. Read More

This paper establishes a sharp condition on the restricted isometry property (RIP) for both the sparse signal recovery and low-rank matrix recovery. It is shown that if the measurement matrix $A$ satisfies the RIP condition $\delta_k^A<1/3$, then all $k$-sparse signals $\beta$ can be recovered exactly via the constrained $\ell_1$ minimization based on $y=A\beta$. Similarly, if the linear map $\cal M$ satisfies the RIP condition $\delta_r^{\cal M}<1/3$, then all matrices $X$ of rank at most $r$ can be recovered exactly via the constrained nuclear norm minimization based on $b={\cal M}(X)$. Read More