Show simple item record

dc.contributor.authorNguyen, Lam M.
dc.contributor.authorNguyen, Phuong Ha
dc.contributor.authorDijk, Marten van
dc.contributor.authorRichtarik, Peter
dc.contributor.authorScheinberg, Katya
dc.contributor.authorTakáč, Martin
dc.date.accessioned2019-05-29T07:01:56Z
dc.date.available2019-05-29T07:01:56Z
dc.date.issued2018-02-11
dc.identifier.citationProceedings of the 35th International Conference on Machine \n Learning, PMLR 80:3747-3755, 2018
dc.identifier.urihttp://hdl.handle.net/10754/653111
dc.description.abstractStochastic gradient descent (SGD) is the optimization algorithm of choice inmany machine learning applications such as regularized empirical riskminimization and training deep neural networks. The classical convergenceanalysis of SGD is carried out under the assumption that the norm of thestochastic gradient is uniformly bounded. While this might hold for some lossfunctions, it is always violated for cases where the objective function isstrongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGDis performed under the assumption that stochastic gradients are bounded withrespect to the true gradient norm. Here we show that for stochastic problemsarising in machine learning such bound always holds; and we also propose analternative convergence analysis of SGD with diminishing learning rate regime,which results in more relaxed conditions than those in (Bottou et al.,2016). Wethen move on the asynchronous parallel setting, and prove convergence ofHogwild! algorithm in the same regime, obtaining the first convergence resultsfor this method in the case of diminished learning rate.
dc.description.sponsorshipThe authors would like to thank the reviewers for useful suggestions which helped to improve the exposition in the paper. The authors also would like to thank Francesco Orabona for his valuable comments and suggestions. \nLam M. Nguyen was partially supported by NSF Grants CCF 16-18717. Phuong Ha Nguyen and Marten van Dijk were supported in part by AFOSR MURI under award number FA9550-14-1-0351. Katya Scheinberg was partially supported by NSF Grants CCF 16-18717 and CCF 17-40796. Martin Takac was supported by U.S. National Science Foundation, under award number NSF:CCF:1618717, NSF:CMMI:1663256 and NSF:CCF:1740796.
dc.publisherarXiv
dc.relation.urlhttps://arxiv.org/pdf/1802.03801
dc.rightsArchived with thanks to arXiv
dc.titleSGD and Hogwild! Convergence Without the Bounded Gradients Assumption
dc.typePreprint
dc.contributor.departmentComputer Science
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.eprint.versionPre-print
dc.contributor.institutionDepartment of Industrial and Systems Engineering, Lehigh University, USA.
dc.contributor.institutionIBM Thomas J. Watson Research Center, USA.
dc.contributor.institutionDepartment of Electrical and Computer Engineering, University of Connecticut, USA.
dc.contributor.institutionEdinburgh, UK
dc.contributor.institutionMIPT, Russia.
dc.identifier.arxivid1802.03801
kaust.personRichtarik, Peter
refterms.dateFOA2019-05-29T07:02:09Z


Files in this item

Thumbnail
Name:
1802.03801.pdf
Size:
1.156Mb
Format:
PDF
Description:
Preprint

This item appears in the following Collection(s)

Show simple item record