Communication-efficient Distributed Sparse Linear Discriminant Analysis

We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size $N$ into $m$ machines, and estimates a local sparse LDA estimator on each machine using the data subset of size $N/m$. After the distributed estimation, our method aggregates the debiased local estimators from $m$ machines, and sparsifies the aggregated estimator. We show that the aggregated estimator attains the same statistical rate as the centralized estimation method, as long as the number of machines $m$ is chosen appropriately. Moreover, we prove that our method can attain the model selection consistency under a milder condition than the centralized method. Experiments on both synthetic and real datasets corroborate our theory.

Comments: 29 pages, 2 figures, 2 tables

Similar Publications

This paper introduces the kernel mixture network, a new method for nonparametric estimation of conditional probability densities using neural networks. We model arbitrarily complex conditional densities as linear combinations of a family of kernel functions centered at a subset of training points. The weights are determined by the outer layer of a deep neural network, trained by minimizing the negative log likelihood. Read More


Here, we present a novel approach to solve the problem of reconstructing perceived stimuli from brain responses by combining probabilistic inference with deep learning. Our approach first inverts the linear transformation from latent features to brain responses with maximum a posteriori estimation and then inverts the nonlinear transformation from perceived stimuli to latent features with adversarial training of convolutional neural networks. We test our approach with a functional magnetic resonance imaging experiment and show that it can generate state-of-the-art reconstructions of perceived faces from brain activations. Read More


Implicit models, which allow for the generation of samples but not for point-wise evaluation of probabilities, are omnipresent in real world problems tackled by machine learning and a hot topic of current research. Some examples include data simulators that are widely used in engineering and scientific research, generative adversarial networks (GANs) for image synthesis, and hot-off-the-press approximate inference techniques relying on implicit distributions. The majority of existing approaches to learning implicit models rely on approximating the intractable distribution or optimisation objective for gradient-based optimisation, which is liable to produce inaccurate updates and thus poor models. Read More


Automatic music transcription (AMT) aims to infer a latent symbolic representation of a piece of music (piano-roll), given a corresponding observed audio recording. Transcribing polyphonic music (when multiple notes are played simultaneously) is a challenging problem, due to highly structured overlapping between harmonics. We study whether the introduction of physically inspired Gaussian process (GP) priors into audio content analysis models improves the extraction of patterns required for AMT. Read More


We propose an efficient method to estimate the accuracy of classifiers using only unlabeled data. We consider a setting with multiple classification problems where the target classes may be tied together through logical constraints. For example, a set of classes may be mutually exclusive, meaning that a data instance can belong to at most one of them. Read More


Gradient matching is a promising tool for learning parameters and state dynamics of ordinary differential equations. It is a grid free inference approach which for fully observable systems is at times competitive with numerical integration. However for many real-world applications, only sparse observations are available or even unobserved variables are included in the model description. Read More


We present a generic framework for trading off fidelity and cost in computing stochastic gradients when the costs of acquiring stochastic gradients of different quality are not known a priori. We consider a mini-batch oracle that distributes a limited query budget over a number of stochastic gradients and aggregates them to estimate the true gradient. Since the optimal mini-batch size depends on the unknown cost-fidelity function, we propose an algorithm, {\it EE-Grad}, that sequentially explores the performance of mini-batch oracles and exploits the accumulated knowledge to estimate the one achieving the best performance in terms of cost-efficiency. Read More


Autoregressive models are among the best performing neural density estimators. We describe an approach for increasing the flexibility of an autoregressive model, based on modelling the random numbers that the model uses internally when generating data. By constructing a stack of autoregressive models, each modelling the random numbers of the next model in the stack, we obtain a type of normalizing flow suitable for density estimation, which we call Masked Autoregressive Flow. Read More


This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time approximation scheme is given for the natural least squares optimization problem in any constant dimension. Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i. Read More


This paper studies the landscape of empirical risk of deep neural networks by theoretically analyzing its convergence behavior to the population risk as well as its stationary points and properties. For an $l$-layer linear neural network, we prove its empirical risk uniformly converges to its population risk at the rate of $\mathcal{O}(r^{2l}\sqrt{d\log(l)}/\sqrt{n})$ with training sample size of $n$, the total weight dimension of $d$ and the magnitude bound $r$ of weight of each layer. We then derive the stability and generalization bounds for the empirical risk based on this result. Read More