Show simple item record

dc.contributor.authorKhaled, Ahmed
dc.contributor.authorMishchenko, Konstantin
dc.contributor.authorRichtarik, Peter
dc.date.accessioned2019-11-27T12:58:31Z
dc.date.available2019-11-27T12:58:31Z
dc.date.issued2019-09-10
dc.identifier.urihttp://hdl.handle.net/10754/660289
dc.description.abstractWe revisit the local Stochastic Gradient Descent (local SGD) method and prove new convergence rates. We close the gap in the theory by showing that it works under unbounded gradients and extend its convergence to weakly convex functions. Furthermore, by changing the assumptions, we manage to get new bounds that explain in what regimes local SGD is faster that its non-local version. For instance, if the objective is strongly convex, we show that, up to constants, it is sufficient to synchronize $M$ times in total, where $M$ is the number of nodes. This improves upon the known requirement of Stich (2018) of $\sqrt{TM}$ synchronization times in total, where $T$ is the total number of iterations, which helps to explain the empirical success of local SGD.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/1909.04746
dc.rightsArchived with thanks to arXiv
dc.titleBetter Communication Complexity for Local SGD
dc.typePreprint
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.eprint.versionPre-print
dc.contributor.institutionCairo University
dc.identifier.arxivid1909.04746
kaust.personMishchenko, Konstantin
kaust.personRichtarik, Peter
refterms.dateFOA2019-11-27T12:58:52Z


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
408.6Kb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record