# Zhihua Zhang

## Contact Details

NameZhihua Zhang |
||

Affiliation |
||

Location |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Learning (16) Statistics - Machine Learning (16) Computer Science - Numerical Analysis (10) Mathematics - Optimization and Control (3) Physics - Mesoscopic Systems and Quantum Hall Effect (2) Physics - Materials Science (2) Mathematics - Metric Geometry (2) Statistics - Methodology (2) Mathematics - Functional Analysis (2) Computer Science - Data Structures and Algorithms (2) Mathematics - Numerical Analysis (2) Mathematics - Mathematical Physics (1) Computer Science - Cryptography and Security (1) Computer Science - Discrete Mathematics (1) Computer Science - Computation and Language (1) Mathematical Physics (1) Computer Science - Distributed; Parallel; and Cluster Computing (1) Computer Science - Information Theory (1) Mathematics - Information Theory (1) Computer Science - Computer Vision and Pattern Recognition (1) Mathematics - Analysis of PDEs (1) |

## Publications Authored By Zhihua Zhang

Optimization plays a key role in machine learning. Recently, stochastic second-order methods have attracted much attention due to their low computational cost in each iteration. However, these algorithms might perform poorly especially if it is hard to approximate the Hessian well and efficiently. Read More

Online Newton step algorithms usually achieve good performance with less training samples than first order methods, but require higher space and time complexity in each iteration. In this paper, we develop a new sketching strategy called regularized frequent direction (RFD) to improve the performance of online Newton algorithms. Unlike the standard frequent direction (FD) which only maintains a sketching matrix, the RFD introduces a regularization term additionally. Read More

Many machine learning models are reformulated as optimization problems. Thus, it is important to solve a large-scale optimization problem in big data applications. Recently, subsampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost in each iteration while commanding a high convergence rate. Read More

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. Read More

Generalized matrix approximation plays a fundamental role in many machine learning problems, such as CUR decomposition, kernel approximation, and matrix low rank approximation. Especially with today's applications involved in larger and larger dataset, more and more efficient generalized matrix approximation algorithems become a crucially important research issue. In this paper, we find new sketching techniques to reduce the size of the original data matrix to develop new matrix approximation algorithms. Read More

Neural machine translation aims at building a single large neural network that can be trained to maximize translation performance. The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation. However, the use of large vocabulary becomes the bottleneck in both training and improving the performance. Read More

Many machine learning models depend on solving a large scale optimization problem. Recently, sub-sampled Newton methods have emerged to attract much attention for optimization due to their efficiency at each iteration, rectified a weakness in the ordinary Newton method of suffering a high cost at each iteration while commanding a high convergence rate. In this work we propose two new efficient Newton-type methods, Refined Sub-sampled Newton and Refined Sketch Newton. Read More

In this paper, we discuss the problem of minimizing the sum of two convex functions: a smooth function plus a non-smooth function. Further, the smooth part can be expressed by the average of a large number of smooth component functions, and the non-smooth part is equipped with a simple proximal mapping. We propose a proximal stochastic second-order method, which is efficient and scalable. Read More

We propose a new input perturbation mechanism for publishing a covariance matrix to achieve $(\epsilon,0)$-differential privacy. Our mechanism uses a Wishart distribution to generate matrix noise. In particular, We apply this mechanism to principal component analysis. Read More

One standard solution for analyzing large natural graphs is to adopt distributed computation on clusters. In distributed computation, graph partitioning (GP) methods assign the vertices or edges of a graph to different machines in a balanced way so that some distributed algorithms can be adapted for. Most of traditional GP methods are offline, which means that the whole graph has been observed before partitioning. Read More

Normalized graph cut (NGC) has become a popular research topic due to its wide applications in a large variety of areas like machine learning and very large scale integration (VLSI) circuit design. Most of traditional NGC methods are based on pairwise relationships (similarities). However, in real-world applications relationships among the vertices (objects) may be more complex than pairwise, which are typically represented as hyperedges in hypergraphs. Read More

Prior optimal CUR decomposition and near optimal column reconstruction methods have been established by combining BSS sampling and adaptive sampling. In this paper, we propose a new approach to the optimal CUR decomposition and near optimal column reconstruction by just using leverage score sampling. In our approach, both the BSS sampling and adaptive sampling are not needed. Read More

In this paper we study nonconvex penalization using Bernstein functions whose first-order derivatives are completely monotone. The Bernstein function can induce a class of nonconvex penalty functions for high-dimensional sparse estimation problems. We derive a thresholding function based on the Bernstein penalty and discuss some important mathematical properties in sparsity modeling. Read More

The singular value decomposition (SVD) is not only a classical theory in matrix computation and analysis, but also is a powerful tool in machine learning and modern data analysis. In this tutorial we first study the basic notion of SVD and then show the central role of SVD in matrices. Using majorization theory, we consider variational principles of singular values and eigenvalues. Read More

The target of $\mathcal{X}$-armed bandit problem is to find the global maximum of an unknown stochastic function $f$, given a finite budget of $n$ evaluations. Recently, $\mathcal{X}$-armed bandits have been widely used in many situations. Many of these applications need to deal with large-scale data sets. Read More

In many learning tasks, structural models usually lead to better interpretability and higher generalization performance. In recent years, however, the simple structural models such as lasso are frequently proved to be insufficient. Accordingly, there has been a lot of work on "superposition-structured" models where multiple structural constraints are imposed. Read More

The power method and block Lanczos method are popular numerical algorithms for computing the truncated singular value decomposition (SVD) and eigenvalue decomposition problems. Especially in the literature of randomized numerical linear algebra, the power method is widely applied to improve the quality of randomized sketching, and relative-error bounds have been well established. Recently, Musco & Musco (2015) proposed a block Krylov subspace method that fully exploits the intermediate results of the power iteration to accelerate convergence. Read More

In this paper, we study the global convergence of majorization minimization (MM) algorithms for solving nonconvex regularized optimization problems. MM algorithms have received great attention in machine learning. However, when applied to nonconvex optimization problems, the convergence of MM algorithms is a challenging issue. Read More

The aim of this paper is to study the obstacle problem with an elliptic operator having degenerate coercivity. We prove the existence of an entropy solution to the obstacle problem under the assumption of $L^{1}-$summability on the data. Meanwhile, we prove that every entropy solution belongs to some Sobolev space $W^{1,q}(\Omega)$. Read More

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at specific points in time. We study the relationship between regret and communication. Read More

First, we extend the results of approximate matrix multiplication from the Frobenius norm to the spectral norm. Second, We develop a class of fast approximate generalized linear regression algorithms with respect to the spectral norm. Finally, We give a fast approximate SVD. Read More

Symmetric positive semi-definite (SPSD) matrix approximation methods have been extensively used to speed up large-scale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nystr\"om method is highly efficient, but can only achieve low accuracy. Read More

Sparse learning is an important topic in many areas such as machine learning, statistical estimation, signal processing, etc. Recently, there emerges a growing interest on structured sparse learning. In this paper we focus on the $\ell_q$-analysis optimization problem for structured sparse learning ($0< q \leq 1$). Read More

Low-rank matrix completion is an important problem with extensive real-world applications. When observations are uniformly sampled from the underlying matrix entries, existing methods all require the matrix to be incoherent. This paper provides the first working method for coherent matrix completion under the standard uniform sampling model. Read More

In this paper we propose and study an optimization problem over a matrix group orbit that we call \emph{Group Orbit Optimization} (GOO). We prove that GOO can be used to induce matrix decomposition techniques such as singular value decomposition (SVD), LU decomposition, QR decomposition, Schur decomposition and Cholesky decomposition, etc. This gives rise to a unified framework for matrix decomposition and allows us to bridge these matrix decomposition methods. Read More

Symmetric positive semidefinite (SPSD) matrix approximation is an important problem with applications in kernel methods. However, existing SPSD matrix approximation methods such as the Nystr\"om method only have weak error bounds. In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds. Read More

Self-organized V-N co-doped TiO2 nanotube arrays (TNAs) with various doping amount were synthesized by anodizing in association with hydrothermal treatment. Impacts of V-N co-doping on the morphologies, phase structures, and photoelectrochemical properties of the TNAs films were thoroughly investigated. The co-doped TiO2 photocatalysts show remarkably enhanced photocatalytic activity for the CO2 photoreduction to methane under ultraviolet illumination. Read More

Highly ordered TiO2 nanotube arrays (TNAs) codoped with V and N were synthesized by electrochemical anodization in association with hydrothermal treatment. The samples were characterized by field emission scanning electron microscopy, X-ray diffraction and X-ray photoelectron spectroscopy. The photocurrent and photocatalytic activity of codoped TiO2 nanotube arrays were investigated under visible light irradiation. Read More

We obtain operator concavity (convexity) of some functions of two or three variables by using perspectives of regular operator mappings of one or several variables. As an application, we obtain, for $ 0

Read More

Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computational challenge, the Nystr\"om method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nystr\"om method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Read More

The notion of matrix entropy was introduced by Tropp and Chen with the aim of measuring the fluctuations of random matrices. It is a certain entropy functional constructed from a representing function with prescribed properties, and Tropp and Chen gave some examples. We give several abstract characterisations of matrix entropies together with a sufficient condition in terms of the second derivative of their representing function. Read More

We are concerned with an approximation problem for a symmetric positive semidefinite matrix due to motivation from a class of nonlinear machine learning methods. We discuss an approximation approach that we call {matrix ridge approximation}. In particular, we define the matrix ridge approximation as an incomplete matrix factorization plus a ridge term. Read More

In this paper we study nonconvex penalization using Bernstein functions. Since the Bernstein function is concave and nonsmooth at the origin, it can induce a class of nonconvex functions for high-dimensional sparse estimation problems. We derive a threshold function based on the Bernstein penalty and give its mathematical properties in sparsity modeling. Read More

In this paper we discuss Bayesian nonconvex penalization for sparse learning problems. We explore a nonparametric formulation for latent shrinkage parameters using subordinators which are one-dimensional L\'{e}vy processes. We particularly study a family of continuous compound Poisson subordinators and a family of discrete compound Poisson subordinators. Read More

In this paper we propose and study a family of sparsity-inducing penalty functions. Since the penalty functions are related to the kinetic energy in special relativity, we call them \emph{kinetic energy plus} (KEP) functions. We construct the KEP function by using the concave conjugate of a $\chi^2$-distance function and present several novel insights into the KEP function with $q=1$. Read More

The CUR matrix decomposition and the Nystr\"{o}m approximation are two important low-rank matrix approximation techniques. The Nystr\"{o}m method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nystr\"{o}m approximation. Read More

The CUR matrix decomposition is an important extension of Nystr\"{o}m approximation to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. Read More

We show that the multi-class support vector machine (MSVM) proposed by Lee et. al. (2004), can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. Read More

In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Read More

Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. Read More

Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is "relaxed" into a tractable eigenvector problem, and in which the relaxed solution is subsequently "rounded" into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspects of spectral clustering, providing a unified analysis of existing algorithms and guiding the design of new algorithms. Read More

**Category:**Mathematics - Metric Geometry

In this paper, by making use of one of Chen's theorems and the method of mathematical analysis, we refine Edwards-Child's inequality and solve a conjecture posed by Liu. Read More

An n-simplex is called circumscriptible (or edge-incentric) if there is a sphere tangent to all its n(n + 1)/2 edges. We obtain a closed formula for the radius of the circumscribed sphere of the circumscriptible n-simplex, and also prove a double inequality involving the circumradius and the edge-inradius of such simplices. Among this inequality settles affirmatively a part of a problem posed by the authors. Read More