An Empirical Study of the Distributed Ellipsoidal Trust Region Method for Large Batch Training
AdvisorsKeyes, David E.
Permanent link to this recordhttp://hdl.handle.net/10754/667327
MetadataShow full item record
AbstractNeural networks optimizers are dominated by first-order methods, due to their inexpensive computational cost per iteration. However, it has been shown that firstorder optimization is prone to reaching sharp minima when trained with large batch sizes. As the batch size increases, the statistical stability of the problem increases, a regime that is well suited for second-order optimization methods. In this thesis, we study a distributed ellipsoidal trust region model for neural networks. We use a block diagonal approximation of the Hessian, assigning consecutive layers of the network to each process. We solve in parallel for the update direction of each subset of the parameters. We show that our optimizer is fit for large batch training as well as increasing number of processes.
CitationAlnasser, A. (2021). An Empirical Study of the Distributed Ellipsoidal Trust Region Method for Large Batch Training. KAUST Research Repository. https://doi.org/10.25781/KAUST-3IQ6E