Global Convergence of Arbitrary-Block Gradient Methods for Generalized Polyak-{\L} ojasiewicz Functions
Type
PreprintAuthors
Csiba, DominikRichtarik, Peter

KAUST Department
Computer Science ProgramComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Date
2017-09-09Permanent link to this record
http://hdl.handle.net/10754/626499
Metadata
Show full item recordAbstract
In this paper we introduce two novel generalizations of the theory for gradient descent type methods in the proximal setting. First, we introduce the proportion function, which we further use to analyze all known (and many new) block-selection rules for block coordinate descent methods under a single framework. This framework includes randomized methods with uniform, non-uniform or even adaptive sampling strategies, as well as deterministic methods with batch, greedy or cyclic selection rules. Second, the theory of strongly-convex optimization was recently generalized to a specific class of non-convex functions satisfying the so-called Polyak-{\L}ojasiewicz condition. To mirror this generalization in the weakly convex case, we introduce the Weak Polyak-{\L}ojasiewicz condition, using which we give global convergence guarantees for a class of non-convex functions previously not considered in theory. Additionally, we establish (necessarily somewhat weaker) convergence guarantees for an even larger class of non-convex functions satisfying a certain smoothness assumption only. By combining the two abovementioned generalizations we recover the state-of-the-art convergence guarantees for a large class of previously known methods and setups as special cases of our general framework. Moreover, our frameworks allows for the derivation of new guarantees for many new combinations of methods and setups, as well as a large class of novel non-convex objectives. The flexibility of our approach offers a lot of potential for future research, as a new block selection procedure will have a convergence guarantee for all objectives considered in our framework, while a new objective analyzed under our approach will have a whole fleet of block selection rules with convergence guarantees readily available.Publisher
arXivarXiv
1709.03014Additional Links
http://arxiv.org/abs/1709.03014v1http://arxiv.org/pdf/1709.03014v1