The growth of data, the need for scalability and the complexity of models
used in modern machine learning calls for distributed implementations. Yet, as
of today, distributed machine learning frameworks have largely ignored the
possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study
the robustness to Byzantine failures at the fundamental level of stochastic
gradient descent (SGD), the heart of most machine learning algorithms. Assuming
a set of $n$ workers, up to $f$ of them being Byzantine, we ask how robust can
SGD be, without limiting the dimension, nor the size of the parameter space.
We first show that no gradient descent update rule based on a linear
combination of the vectors proposed by the workers (i.e, current approaches)
tolerates a single Byzantine failure. We then formulate a resilience property
of the update rule capturing the basic requirements to guarantee convergence
despite $f$ Byzantine workers. We finally propose Krum, an update rule that
satisfies the resilience property aforementioned. For a $d$-dimensional
learning problem, the time complexity of Krum is $O(n^2 \cdot (d + \log n))$.