Stochastic Distributed Learning with Gradient Quantization and Variance Reduction
Type
PreprintKAUST Department
Computer ScienceComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Statistics
Statistics Program
Date
2019-04-10Permanent link to this record
http://hdl.handle.net/10754/653103
Metadata
Show full item recordAbstract
We consider distributed optimization where the objective function is spreadamong different devices, each sending incremental model updates to a centralserver. To alleviate the communication bottleneck, recent work proposed variousschemes to compress (e.g.\ quantize or sparsify) the gradients, therebyintroducing additional variance $\omega \geq 1$ that might slow downconvergence. For strongly convex functions with condition number $\kappa$distributed among $n$ machines, we (i) give a scheme that converges in$\mathcal{O}((\kappa + \kappa \frac{\omega}{n} + \omega)$ $\log (1/\epsilon))$steps to a neighborhood of the optimal solution. For objective functions with afinite-sum structure, each worker having less than $m$ components, we (ii)present novel variance reduced schemes that converge in $\mathcal{O}((\kappa +\kappa \frac{\omega}{n} + \omega + m)\log(1/\epsilon))$ steps to arbitraryaccuracy $\epsilon > 0$. These are the first methods that achieve linearconvergence for arbitrary quantized updates. We also (iii) give analysis forthe weakly convex and non-convex cases and (iv) verify in experiments that ournovel variance reduced schemes are more efficient than the baselines.Sponsors
The authors would like to thank Xun Qian for the careful checking of the proofs and for spotting several typos in the analysis.Publisher
arXivarXiv
1904.05115Additional Links
https://arxiv.org/abs/1904.05115https://arxiv.org/pdf/1904.05115