99% of Worker-Master Communication in Distributed Optimization Is Not Needed
dc.contributor.author | Mishchenko, Konstantin | |
dc.contributor.author | Hanzely, Filip | |
dc.contributor.author | Richtarik, Peter | |
dc.date.accessioned | 2022-01-12T13:48:15Z | |
dc.date.available | 2022-01-12T13:48:15Z | |
dc.date.issued | 2020 | |
dc.identifier.issn | 2640-3498 | |
dc.identifier.uri | http://hdl.handle.net/10754/674939 | |
dc.description.abstract | In this paper we discuss sparsification of worker-to-server communication in large distributed systems. We improve upon algorithms that fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs averaging. The average is broadcast back to the workers, which use it to perform a gradient-type step to update the local version of the model. We observe that the above template is fundamentally inefficient in that too much data is unnecessarily communicated from the workers to the server, which slows down the overall system. We propose a fix based on a new update-sparsification method we develop in this work, which we suggest being used on top of existing methods. Namely, we develop a new variant of parallel block coordinate descent based on independent sparsification of the local gradient estimates before communication. We demonstrate that with only m/n blocks sent by each of n workers, where m is the total number of parameter blocks, the theoretical iteration complexity of the underlying distributed methods is essentially unaffected. As an illustration, this means that when n = 100 parallel workers are used, the communication of 99% blocks is redundant, and hence a waste of time. Our theoretical claims are supported through extensive numerical experiments which demonstrate an almost perfect match with our theory on a number of synthetic and real datasets. | |
dc.relation.url | https://proceedings.mlr.press/v124/mishchenko20a.html | |
dc.subject | PARALLEL | |
dc.subject | COORDINATE DESCENT | |
dc.title | 99% of Worker-Master Communication in Distributed Optimization Is Not Needed | |
dc.type | Conference Paper | |
dc.contributor.department | Applied Mathematics and Computational Science Program | |
dc.contributor.department | Computer Science Program | |
dc.contributor.department | Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division | |
dc.contributor.department | Visual Computing Center (VCC) | |
dc.conference.date | AUG 03-06, 2020 | |
dc.conference.name | Conference on Uncertainty in Artificial Intelligence (UAI) | |
dc.conference.location | ELECTR NETWORK | |
dc.identifier.wosut | WOS:000723388600099 | |
dc.eprint.version | Pre-print | |
dc.identifier.volume | 124 | |
dc.identifier.pages | 979-988 | |
kaust.person | Mishchenko, Konstantin | |
kaust.person | Hanzely, Filip | |
kaust.person | Richtarik, Peter |
This item appears in the following Collection(s)
-
Conference Papers
-
Applied Mathematics and Computational Science Program
For more information visit: https://cemse.kaust.edu.sa/amcs -
Computer Science Program
For more information visit: https://cemse.kaust.edu.sa/cs -
Visual Computing Center (VCC)
-
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/