dc.contributor.author Khaled, Ahmed dc.contributor.author Mishchenko, Konstantin dc.contributor.author Richtarik, Peter dc.date.accessioned 2019-11-27T12:58:31Z dc.date.available 2019-11-27T12:58:31Z dc.date.issued 2019-09-10 dc.identifier.uri http://hdl.handle.net/10754/660289 dc.description.abstract We revisit the local Stochastic Gradient Descent (local SGD) method and prove new convergence rates. We close the gap in the theory by showing that it works under unbounded gradients and extend its convergence to weakly convex functions. Furthermore, by changing the assumptions, we manage to get new bounds that explain in what regimes local SGD is faster that its non-local version. For instance, if the objective is strongly convex, we show that, up to constants, it is sufficient to synchronize $M$ times in total, where $M$ is the number of nodes. This improves upon the known requirement of Stich (2018) of $\sqrt{TM}$ synchronization times in total, where $T$ is the total number of iterations, which helps to explain the empirical success of local SGD. dc.publisher arXiv dc.relation.url https://arxiv.org/pdf/1909.04746 dc.rights Archived with thanks to arXiv dc.title Better Communication Complexity for Local SGD dc.type Preprint dc.contributor.department Computer Science Program dc.contributor.department Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division dc.eprint.version Pre-print dc.contributor.institution Cairo University dc.identifier.arxivid 1909.04746 kaust.person Mishchenko, Konstantin kaust.person Richtarik, Peter refterms.dateFOA 2019-11-27T12:58:52Z
﻿

### Files in this item

Name:
Preprintfile1.pdf
Size:
408.6Kb
Format:
PDF
Description:
Pre-print