Type
PreprintAuthors
Sapio, Amedeo
Canini, Marco

Ho, Chen-Yu

Nelson, Jacob
Kalnis, Panos

Kim, Changhoon
Krishnamurthy, Arvind
Moshref, Masoud
Ports, Dan R. K.
Richtarik, Peter

KAUST Department
Computer ScienceComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Extreme Computing Research Center
Date
2019-02-22Permanent link to this record
http://hdl.handle.net/10754/653105
Metadata
Show full item recordSummary
This record has been merged with an existing record at: http://hdl.handle.net/10754/631179.
Abstract
Training complex machine learning models in parallel is an increasinglyimportant workload. We accelerate distributed parallel training by designing acommunication primitive that uses a programmable switch dataplane to execute akey step of the training process. Our approach, SwitchML, reduces the volume ofexchanged data by aggregating the model updates from multiple workers in thenetwork. We co-design the switch processing with the end-host protocols and MLframeworks to provide a robust, efficient solution that speeds up training byup to 300%, and at least by 20% for a number of real-world benchmark models.Publisher
arXivarXiv
1903.06701Additional Links
https://arxiv.org/abs/1903.06701https://arxiv.org/pdf/1903.06701