Show simple item record

dc.contributor.authorLoizou, Nicolas
dc.contributor.authorRichtarik, Peter
dc.date.accessioned2020-10-01T06:07:23Z
dc.date.available2018-01-15T06:10:41Z
dc.date.available2020-10-01T06:07:23Z
dc.date.issued2020-09-23
dc.date.submitted2018-01-18
dc.identifier.citationLoizou, N., & Richtárik, P. (2020). Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods. Computational Optimization and Applications. doi:10.1007/s10589-020-00220-z
dc.identifier.issn1573-2894
dc.identifier.issn0926-6003
dc.identifier.doi10.1007/s10589-020-00220-z
dc.identifier.urihttp://hdl.handle.net/10754/626761
dc.description.abstractIn this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent: convex quadratic problems. We prove global non-asymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates, and dual function values. We also show that the primal iterates converge at an accelerated linear rate in a somewhat weaker sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesàro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
dc.publisherSpringer Nature
dc.relation.urlhttp://link.springer.com/10.1007/s10589-020-00220-z
dc.rightsArchived with thanks to Computational Optimization and Applications
dc.titleMomentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
dc.typeArticle
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.identifier.journalComputational Optimization and Applications
dc.rights.embargodate2021-09-23
dc.eprint.versionPost-print
dc.contributor.institutionMila and DIRO, Université de Montréal, Quebec, Canada
dc.identifier.arxivid1712.09677
kaust.personRichtarik, Peter
dc.date.accepted2020-09-23
dc.identifier.eid2-s2.0-85091407940
refterms.dateFOA2018-06-13T19:21:25Z
dc.date.published-online2020-09-23
dc.date.published-print2020-12
dc.date.posted2017-12-27


Files in this item

Thumbnail
Name:
1712.09677v1.pdf
Size:
2.534Mb
Format:
PDF
Description:
Preprint

This item appears in the following Collection(s)

Show simple item record

VersionItemEditorDateSummary

*Selected version