# Han Liu - Johns Hopkins University

## Contact Details

NameHan Liu |
||

AffiliationJohns Hopkins University |
||

CityBaltimore |
||

CountryUnited States |
||

## Pubs By Year |
||

## External Links |
||

## Pub CategoriesStatistics - Machine Learning (31) Mathematics - Statistics (12) Statistics - Theory (12) Mathematics - Optimization and Control (8) Computer Science - Learning (7) Physics - Mesoscopic Systems and Quantum Hall Effect (7) Statistics - Methodology (5) Physics - Materials Science (4) Mathematics - Information Theory (3) Computer Science - Information Theory (3) Mathematics - Complex Variables (1) Computer Science - Human-Computer Interaction (1) Computer Science - Artificial Intelligence (1) Computer Science - Graphics (1) Mathematics - Analysis of PDEs (1) Mathematics - Dynamical Systems (1) |

## Publications Authored By Han Liu

Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i. Read More

This paper studies the asymptotic behavior of smooth solutions to the generalized Hall-magneto-hydrodynamics system (1.1) with one single diffusion on the whole space $\R^3$. We establish that, in the inviscid resistive case, the energy $\|b(t)\|_2^2$ vanishes and $\|u(t)\|_2^2$ converges to a constant as time tends to infinity provided the velocity is bounded in $W^{1-\alpha,\frac3\alpha}(\R^3)$; in the viscous non-resistive case, the energy $\|u(t)\|_2^2$ vanishes and $\|b(t)\|_2^2$ converges to a constant provided the magnetic field is bounded in $W^{1-\beta,\infty}(\R^3)$. Read More

High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we are interested in a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). Our parametric simplex method offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Read More

We propose a general theory for studying the geometry of nonconvex objective functions with underlying symmetric structures. In specific, we characterize the locations of stationary points and the null space of the associated Hessian matrices via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global geometry of the low-rank matrix factorization problem. Read More

This paper studies the matrix completion problem under arbitrary sampling schemes. We propose a new estimator incorporating both max-norm and nuclear-norm regularization, based on which we can conduct efficient low-rank matrix recovery using a random subset of entries observed with additive noise under general non-uniform and unknown sampling distributions. This method significantly relaxes the uniform sampling assumption imposed for the widely used nuclear-norm penalized approach, and makes low-rank matrix recovery feasible in more practical settings. Read More

We consider the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. Read More

We propose a new family of combinatorial inference problems for graphical models. Unlike classical statistical inference where the main interest is point estimation or parameter testing, combinatorial inference aims at testing the global structure of the underlying graph. Examples include testing the graph connectivity, the presence of a cycle of certain size, or the maximum degree of the graph. Read More

Accurately drawing 3D objects is difficult for untrained individuals, as it requires an understanding of perspective and its effects on geometry and proportions. Step-by-step tutorials break the complex task of sketching an entire object down into easy-to-follow steps that even a novice can follow. However, creating such tutorials requires expert knowledge and is a time-consuming task. Read More

The cyclic block coordinate descent-type (CBCD-type) methods, which performs iterative updates for a few coordinates (a block) simultaneously throughout the procedure, have shown remarkable computational performance for solving strongly convex minimization problems. Typical applications include many popular statistical machine learning methods such as elastic-net regression, ridge penalized logistic regression, and sparse additive regression. Existing optimization literature has shown that for strongly convex minimization, the CBCD-type methods attain iteration complexity of $\cO(p\log(1/\epsilon))$, where $\epsilon$ is a pre-specified accuracy of the objective value, and $p$ is the number of blocks. Read More

A comparative study of the radiation-induced magnetoresistance oscillations in the high mobility GaAs/AlGaAs heterostructure two dimensional electron system (2DES) under linearly- and circularlypolarized microwave excitation indicates a profound difference in the response observed upon rotating the microwave launcher for the two cases, although circularly polarized microwave radiation induced magnetoresistance oscillations observed at low magnetic fields are similar to the oscillations observed with linearly polarized radiation. For the linearly polarized radiation, the magnetoresistive response is a strong sinusoidal function of the launcher rotation (or linear polarization) angle, {\theta}. For circularly polarized radiation, the oscillatory magnetoresistive response is hardly sensitive to {\theta}. Read More

Many statistical machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. In this paper, we study this fundamental tradeoff through a SQRT-Lasso problem for sparse linear regression and sparse precision matrix estimation in high dimensions. We explain how novel optimization techniques help address these computational challenges. Read More

We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Read More

Sparse generalized eigenvalue problem plays a pivotal role in a large family of high-dimensional learning tasks, including sparse Fisher's discriminant analysis, canonical correlation analysis, and sufficient dimension reduction. However, the theory of sparse generalized eigenvalue problem remains largely unexplored. In this paper, we exploit a non-convex optimization perspective to study this problem. Read More

Principal component analysis (PCA) has been a prominent tool for high-dimensional data analysis. Online algorithms that estimate the principal component by processing streaming data are of tremendous practical and theoretical interests. Despite its rich applications, theoretical convergence analysis remains largely open. Read More

Heterogeneity is an unwanted variation when analyzing aggregated datasets from multiple sources. Though different methods have been proposed for heterogeneity adjustment, no systematic theory exists to justify these methods. In this work, we propose a generic framework named ALPHA (short for Adaptive Low-rank Principal Heterogeneity Adjustment) to model, estimate, and adjust heterogeneity from the original data. Read More

Linear polarization angle, $\theta$, dependent measurements of the microwave radiation-induced oscillatory magnetoresistance, $R_{xx}$, in high mobility GaAs/AlGaAs 2D electron devices have shown a $\theta$ dependence in the oscillatory amplitude along with magnetic field, frequency, and extrema-dependent phase shifts, $\theta_{0}$. Here, we suggest a microwave frequency dependence of $\theta_{0} (f)$ using an analysis that averages over other smaller contributions, when those contributions are smaller than estimates of the experimental uncertainty. Read More

A massive dataset often consists of a growing number of (potentially) heterogeneous sub-populations. This paper is concerned about testing various forms of heterogeneity arising from massive data. In a general nonparametric framework, a set of testing procedures are designed to accommodate a growing number of sub-populations, denoted as $s$, with computational feasibility. Read More

Let $f:\mathbb{CP}^2\dashrightarrow\mathbb{CP^2}$ be a rational map with algebraic and topological degrees both equal to $d\geq 2$. Little is known in general about the ergodic properties of such maps. We show here, however, that for an open set of automorphisms $T:\mathbb{CP}^2\to\mathbb{CP}^2$, the perturbed map $T\circ f$ admits exactly two ergodic measures of maximal entropy $\log d$, one of saddle and one of repelling type. Read More

We study the fundamental tradeoffs between computational tractability and statistical accuracy for a general family of hypothesis testing problems with combinatorial structures. Based upon an oracle model of computation, which captures the interactions between algorithms and data, we establish a general lower bound that explicitly connects the minimum testing risk under computational budget constraints with the intrinsic probabilistic and combinatorial structures of statistical problems. This lower bound mirrors the classical statistical lower bound by Le Cam (1986) and allows us to quantify the optimal statistical performance achievable given limited computational budgets in a systematic fashion. Read More

We propose a novel class of dynamic nonparanormal graphical models, which allows us to model high dimensional heavy-tailed systems and the evolution of their latent network structures. Under this model we develop statistical tests for presence of edges both locally at a fixed index value and globally over a range of values. The tests are developed for a high-dimensional regime, are robust to model selection mistakes and do not require commonly assumed minimum signal strength. Read More

We study parameter estimation and asymptotic inference for sparse nonlinear regression. More specifically, we assume the data are given by $y = f( x^\top \beta^* ) + \epsilon$, where $f$ is nonlinear. To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator. Read More

We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems. Read More

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from $k$ subsamples of size $n/k$, where $n$ is the sample size. In both low dimensional and high dimensional settings, we address the important question of how to choose $k$ as $n$ grows large, providing a theoretical upper bound on $k$ such that the information loss due to the divide and conquer algorithm is negligible. Read More

We proposed a general Principal Orthogonal complEment Thresholding (POET) framework for large-scale covariance matrix estimation based on an approximate factor model. A set of high level sufficient conditions for the procedure to achieve optimal rates of convergence under different matrix norms were brought up to better understand how POET works. Such a framework allows us to recover the results for sub-Gaussian in a more transparent way that only depends on the concentration properties of the sample covariance matrix. Read More

We propose a computational framework named iterative local adaptive majorize-minimization (I-LAMM) to simultaneously control algorithmic complexity and statistical error when fitting high dimensional models. I-LAMM is a two-stage algorithmic implementation of the local linear approximation to a family of folded concave penalized quasi-likelihood. The first stage solves a convex program with a crude precision tolerance to obtain a coarse initial estimator, which is further refined in the second stage by iteratively solving a sequence of convex programs with smaller precision tolerances. Read More

Linear regression studies the problem of estimating a model parameter $\beta^* \in \mathbb{R}^p$, from $n$ observations $\{(y_i,\mathbf{x}_i)\}_{i=1}^n$ from linear model $y_i = \langle \mathbf{x}_i,\beta^* \rangle + \epsilon_i$. We consider a significant generalization in which the relationship between $\langle \mathbf{x}_i,\beta^* \rangle$ and $y_i$ is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. Read More

We consider the problem of estimating undirected triangle-free graphs of high dimensional distributions. Triangle-free graphs form a rich graph family which allows arbitrary loopy structures but 3-cliques. For inferential tractability, we propose a graphical Fermat's principle to regularize the distribution family. Read More

Estimating large covariance and precision matrices are fundamental in modern multivariate analysis. The problems arise from statistical analysis of large panel economics and finance data. The covariance matrix reveals marginal correlations between variables, while the precision matrix encodes conditional correlations between pairs of variables given the remaining variables. Read More

The metal contacts on 2D black phosphorus field-effect transistor and photodetectors are studied. The metal work functions can significantly impact the Schottky barrier at the metal-semiconductor contact in black phosphorus devices. Higher metal work functions lead to larger output hole currents in p-type transistors, while ambipolar characteristics can be observed with lower work function metals. Read More

We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R\&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. We aim to identify the sparsity pattern and select the best alternative before the finite budget is exhausted. We derive a knowledge gradient policy for sparse linear models (KGSpLin) with group Lasso penalty. Read More

We propose a novel high dimensional nonparametric model named ATLAS which naturally generlizes the sparse additive model. Given a covariate of interest $X_j$, the ATLAS model assumes the mean function can be locally approximated by a sparse additive function whose sparsity pattern may vary from the global perspective. We propose to infer the marginal influence function $f_j^*(z) = \mathbb{E}[f(X_1,\ldots, X_d) \mid X_j = z]$ using a new kernel-sieve approach that combines the local kernel regression with the B-spline basis approximation. Read More

Many high dimensional sparse learning problems are formulated as nonconvex optimization. A popular approach to solve these nonconvex optimization problems is through convex relaxations such as linear and semidefinite programming. In this paper, we study the statistical limits of convex relaxations. Read More

There has been an increasing interest in testing the equality of large Pearson's correlation matrices. However, in many applications it is more important to test the equality of large rank-based correlation matrices since they are more robust to outliers and nonlinearity. Unlike the Pearson's case, testing the equality of large rank-based statistics has not been well explored and requires us to develop new methods and theory. Read More

This paper proposes a unified framework to quantify local and global inferential uncertainty for high dimensional nonparanormal graphical models. In particular, we consider the problems of testing the presence of a single edge and constructing a uniform confidence subgraph. Due to the presence of unknown marginal transformations, we propose a pseudo likelihood based inferential approach. Read More

We propose a novel sparse tensor decomposition method, namely Tensor Truncated Power (TTP) method, that incorporates variable selection into the estimation of decomposition components. The sparsity is achieved via an efficient truncation step embedded in the tensor power iteration. Our method applies to a broad family of high dimensional latent variable models, including high dimensional Gaussian mixture and mixtures of sparse regressions. Read More

We propose a bootstrap-based robust high-confidence level upper bound (Robust H-CLUB) for assessing the risks of large portfolios. The proposed approach exploits rank-based and quantile-based estimators, and can be viewed as a robust extension of the H-CLUB method (Fan et al., 2015). Read More

We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data. Different from the existing mixed graphical models, we allow the nodewise conditional distributions to be semiparametric generalized linear models with unspecified base measure functions. Thus, one advantage of our method is that it is unnecessary to specify the type of each node and the method is more convenient to apply in practice. Read More

We propose a robust inferential procedure for assessing uncertainties of parameter estimation in high-dimensional linear models, where the dimension $p$ can grow exponentially fast with the sample size $n$. Our method combines the de-biasing technique with the composite quantile function to construct an estimator that is asymptotically normal. Hence it can be used to construct valid confidence intervals and conduct hypothesis tests. Read More

We provide a general theory of the expectation-maximization (EM) algorithm for inferring high dimensional latent variable models. In particular, we make two contributions: (i) For parameter estimation, we propose a novel high dimensional EM algorithm which naturally incorporates sparsity structure into parameter estimation. With an appropriate initialization, this algorithm converges at a geometric rate and attains an estimator with the (near-)optimal statistical rate of convergence. Read More

We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Read More

The pathwise coordinate optimization is one of the most important computational frameworks for high dimensional convex and nonconvex sparse learning problems. It differs from the classical coordinate optimization algorithms in three salient features: {\it warm start initialization}, {\it active set updating}, and {\it strong rule for coordinate preselection}. Such a complex algorithmic structure grants superior empirical performance, but also poses significant challenge to theoretical analysis. Read More

This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. Read More

We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. Read More

We report the results of a combined microwave polarization-dependence and power-dependence study of the microwave-radiation-induced magnetoresistance oscillations in high mobility GaAs/AlGaAs heterostructure devices at liquid helium temperatures. The diagonal resistance was measured with the magnetic field fixed at the extrema of the radiation-induced magnetoresistance oscillations, as the microwave power was varied at a number of microwave polarization angles. The results indicate a nonlinear relation between the oscillatory peak or valley magnetoresistance and the microwave power, as well as a cosine square relation between the oscillatory peak or valley magnetoresistance and the microwave polarization angle. Read More

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e. Read More

Phosphorus is one of the most abundant elements preserved in earth, constructing with a fraction of ~0.1% of the earth crust. In general, phosphorus has several allotropes. Read More

We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. Read More

Layered two-dimensional (2D) semiconducting transition metal dichalcogenides (TMD) have been widely isolated, synthesized, and characterized recently. Numerous 2D materials are identified as the potential candidates as channel materials for future thin film technology due to their high mobility and the exhibiting bandgaps. While many TMD filed-effect transistors (FETs) have been widely demonstrated along with a significant progress to clearly understand the device physics, large contact resistance at metal/semiconductor interface still remain a challenge. Read More

We consider the problem of testing mutual independence of all entries in a d-dimensional random vector X=(X_1,... Read More

Low-resistivity metal-semiconductor (M-S) contact is one of the urgent challenges in the research of 2D transition metal dichalcogenides (TMDs). Here, we report a chloride molecular doping technique which greatly reduces the contact resistance (Rc) in the few-layer WS2 and MoS2. After doping, the Rc of WS2 and MoS2 have been decreased to 0. Read More