# Haim Avron - IBM T.J. Watson Research Center

## Contact Details

NameHaim Avron |
||

AffiliationIBM T.J. Watson Research Center |
||

CityHawthorne |
||

CountryUnited States |
||

## Pubs By Year |
||

## Pub CategoriesComputer Science - Data Structures and Algorithms (7) Computer Science - Numerical Analysis (7) Computer Science - Learning (6) Mathematics - Numerical Analysis (6) Statistics - Machine Learning (5) Computer Science - Distributed; Parallel; and Cluster Computing (2) Statistics - Computation (1) |

## Publications Authored By Haim Avron

We consider a class of misspecified dynamical models where the governing term is only approximately known. Under the assumption that observations of the system's evolution are accessible for various initial conditions, our goal is to infer a non-parametric correction to the misspecified driving term such as to faithfully represent the system dynamics and devise system evolution predictions for unobserved initial conditions. We model the unknown correction term as a Gaussian Process and analyze the problem of efficient experimental design to find an optimal correction term under constraints such as a limited experimental budget. Read More

Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. Read More

The technique of matrix sketching, such as the use of random projections, has been shown in recent years to be a powerful tool for accelerating many important statistical learning techniques. Research has so far focused largely on using sketching for the "vanilla" un-regularized versions of these techniques. Here we study sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis. Read More

We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nystr\"{o}m method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. Read More

Computation of the trace of a matrix function plays an important role in many scientific computing applications, including applications in machine learning, computational physics (e.g., lattice quantum chromodynamics), network analysis and computational biology (e. Read More

We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g. Read More

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods fit this need well, as they constitute a versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. Read More

Asynchronous methods for solving systems of linear equations have been researched since Chazan and Miranker's pioneering 1969 paper on chaotic relaxation. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to make progress even if not all progress made by other processors has been communicated to them. Historically, the applicability of asynchronous methods for solving linear equations was limited to certain restricted classes of matrices, such as diagonally dominant matrices. Read More

We describe a Krylov-subspace method for estimating the spectral condition number of a real matrix A or indicating that it is numerically rank deficient. The main difficulty in estimating the condition number is the estimation of the smallest singular value \sigma_{\min} of A. Our method estimates this value by solving a consistent linear least-squares problem with a known solution using a specific Krylov-subspace method called LSQR. Read More

We present a fast algorithm for approximate Canonical Correlation Analysis (CCA). Given a pair of tall-and-thin matrices, the proposed algorithm first employs a randomized dimensionality reduction transform to reduce the size of the input matrices, and then applies any CCA algorithm to the new pair of matrices. The algorithm computes an approximate CCA to the original pair of matrices with provable guarantees, while requiring asymptotically less operations than the state-of-the-art exact algorithms. Read More

**Affiliations:**

^{1}IBM T.J. Watson Research Center,

^{2}IBM T.J. Watson Research Center,

^{3}IBM T.J. Watson Research Center,

^{4}IBM T.J. Watson Research Center

We describe novel subgradient methods for a broad class of matrix optimization problems involving nuclear norm regularization. Unlike existing approaches, our method executes very cheap iterations by combining low-rank stochastic subgradients with efficient incremental SVD updates, made possible by highly optimized and parallelizable dense linear algebra operations on small matrices. Our practical algorithms always maintain a low-rank factorization of iterates that can be conveniently held in memory and efficiently multiplied to generate predictions in matrix completion settings. Read More

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of the subsampled matrix has as smallest norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation algorithms for this problem with approximation bounds that are optimal up to constant factors. Read More

We define the notion of effective stiffness and show that it can used to build sparsifiers, algorithms that sparsify linear systems arising from finite-element discretizations of PDEs. In particular, we show that sampling $O(n\log n)$ elements according to probabilities derived from effective stiffnesses yields a high quality preconditioner that can be used to solve the linear system in a small number of iterations. Effective stiffness generalizes the notion of effective resistance, a key ingredient of recent progress in developing nearly linear symmetric diagonally dominant (SDD) linear solvers. Read More

This short communication shows that in some cases scalar elliptic finite element matrices cannot be approximated well by an SDD matrix. We also give a theoretical analysis of a simple heuristic method for approximating an element by an SDD matrix. Read More