Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

Handle URI:
http://hdl.handle.net/10754/626761
Title:
Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods
Authors:
Loizou, Nicolas; Richtarik, Peter
Abstract:
In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
arXiv
Issue Date:
27-Dec-2017
ARXIV:
arXiv:1712.09677
Type:
Preprint
Additional Links:
http://arxiv.org/abs/1712.09677v1; http://arxiv.org/pdf/1712.09677v1
Appears in Collections:
Other/General Submission; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorLoizou, Nicolasen
dc.contributor.authorRichtarik, Peteren
dc.date.accessioned2018-01-15T06:10:41Z-
dc.date.available2018-01-15T06:10:41Z-
dc.date.issued2017-12-27en
dc.identifier.urihttp://hdl.handle.net/10754/626761-
dc.description.abstractIn this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.en
dc.publisherarXiven
dc.relation.urlhttp://arxiv.org/abs/1712.09677v1en
dc.relation.urlhttp://arxiv.org/pdf/1712.09677v1en
dc.rightsArchived with thanks to arXiven
dc.titleMomentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methodsen
dc.typePreprinten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.eprint.versionPre-printen
dc.contributor.institutionSchool of Mathematics, The University of Edinburgh, United Kingdomen
dc.contributor.institutionMoscow Institute of Physics and Technology (MIPT), Dolgoprudny, Moscow, Russiaen
dc.identifier.arxividarXiv:1712.09677en
kaust.authorRichtarik, Peteren
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.