Show simple item record

dc.contributor.authorNguyen, Lam M.
dc.contributor.authorPhuong Ha Nguyen
dc.contributor.authorRichtarik, Peter
dc.contributor.authorScheinberg, Katya
dc.contributor.authorTakac, Martin
dc.contributor.authorvan Dijk, Marten
dc.date.accessioned2021-07-12T08:02:32Z
dc.date.available2019-05-29T09:02:31Z
dc.date.available2021-07-12T08:02:32Z
dc.date.issued2019
dc.identifier.issn1532-4435
dc.identifier.urihttp://hdl.handle.net/10754/653120
dc.description.abstractThe classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates {ηt} satisfies P∞t=0 ηt → ∞ and P∞t=0 η2t < ∞.We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when {ηt} is a diminishing sequence and P∞ t=0 ηt → ∞. Inother words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD.
dc.description.sponsorshipPhuong Ha Nguyen and Marten van Dijk were supported in part by AFOSR MURI under award number FA9550-14-1-0351. Katya Scheinberg was partially supported by NSF Grants CCF 16-18717 and CCF 17-40796. Martin Takáč was partially supported by the NSF Grant CCF-1618717, CMMI-1663256 and CCF-1740796
dc.publisherarXiv
dc.relation.urlhttp://jmlr.org/papers/v20/18-759.html
dc.rightsArchived with thanks to the Journal of Machine Learning Research
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectStochastic Gradient Algorithms
dc.subjectAsynchronous Stochastic Optimization
dc.subjectSGD
dc.subjectHogwild
dc.subjectbounded gradient
dc.titleNew Convergence Aspects of Stochastic Gradient Algorithms
dc.typeArticle
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science Program
dc.identifier.journalJournal of Machine Learning Research
dc.identifier.wosutWOS:000506403100016
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionIBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
dc.contributor.institutionDepartment of Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06268, USA
dc.contributor.institutionEdinburgh, UK
dc.contributor.institutionMIPT, Russia
dc.contributor.institutionDepartment of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA 18015, USA
dc.identifier.volume20
dc.identifier.arxivid1811.12403
kaust.personRichtarik, Peter
refterms.dateFOA2019-05-29T09:02:48Z


Files in this item

Thumbnail
Name:
18-759.pdf
Size:
1.276Mb
Format:
PDF
Description:
Publisher's version

This item appears in the following Collection(s)

Show simple item record

Archived with thanks to the Journal of Machine Learning Research
Except where otherwise noted, this item's license is described as Archived with thanks to the Journal of Machine Learning Research
VersionItemEditorDateSummary

*Selected version