Show simple item record

dc.contributor.authorSafaryan, Mher
dc.contributor.authorRichtarik, Peter
dc.date.accessioned2019-11-27T11:54:03Z
dc.date.available2019-11-27T11:54:03Z
dc.date.issued2019-05-30
dc.identifier.urihttp://hdl.handle.net/10754/660282
dc.description.abstractVarious gradient compression schemes have been proposed to mitigate the communication cost in distributed training of large scale machine learning models. Sign-based methods, such as signSGD, have recently been gaining popularity because of their simple compression rule and connection to adaptive gradient methods, like ADAM. In this paper, we perform a general analysis of sign-based methods for non-convex optimization. Our analysis is built on intuitive bounds on success probabilities and does not rely on special noise distributions nor on the boundedness of the variance of stochastic gradients. Extending the theory to distributed setting within a parameter server framework, we assure exponentially fast variance reduction with respect to number of nodes, maintaining 1-bit compression in both directions and using small mini-batch sizes. We validate our theoretical findings experimentally.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/1905.12938
dc.rightsArchived with thanks to arXiv
dc.titleOn Stochastic Sign Descent Methods
dc.typePreprint
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science Program
dc.eprint.versionPre-print
dc.contributor.institutionMIPT, Russia
dc.identifier.arxivid1905.12938
kaust.personSafaryan, Mher
kaust.personRichtarik, Peter
refterms.dateFOA2019-11-27T11:54:45Z


Files in this item

Thumbnail
Name:
Preprintfile1.pdf
Size:
3.555Mb
Format:
PDF
Description:
Pre-print

This item appears in the following Collection(s)

Show simple item record