Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

Handle URI:
http://hdl.handle.net/10754/598455
Title:
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms
Authors:
Hasanov, Khalid; Quintin, Jean-Noël; Lastovetsky, Alexey
Abstract:
© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Citation:
Hasanov K, Quintin J-N, Lastovetsky A (2014) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. The Journal of Supercomputing. Available: http://dx.doi.org/10.1007/s11227-014-1133-x.
Publisher:
Springer Science + Business Media
Journal:
The Journal of Supercomputing
Issue Date:
4-Mar-2014
DOI:
10.1007/s11227-014-1133-x
Type:
Article
ISSN:
0920-8542; 1573-0484
Sponsors:
The research in this paper was supported by IRCSET (Irish Research Council for Science, Engineering and Technology) and IBM, grant numbers EPSG/2011/188 and EPSPD/2011/207. Some of the experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://​www.​grid5000.​fr) Another part of the experiments was carried out using the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia. The authors would like to thank Ashley DeFlumere for her useful comments and corrections.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorHasanov, Khaliden
dc.contributor.authorQuintin, Jean-Noëlen
dc.contributor.authorLastovetsky, Alexeyen
dc.date.accessioned2016-02-25T13:21:01Zen
dc.date.available2016-02-25T13:21:01Zen
dc.date.issued2014-03-04en
dc.identifier.citationHasanov K, Quintin J-N, Lastovetsky A (2014) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. The Journal of Supercomputing. Available: http://dx.doi.org/10.1007/s11227-014-1133-x.en
dc.identifier.issn0920-8542en
dc.identifier.issn1573-0484en
dc.identifier.doi10.1007/s11227-014-1133-xen
dc.identifier.urihttp://hdl.handle.net/10754/598455en
dc.description.abstract© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.en
dc.description.sponsorshipThe research in this paper was supported by IRCSET (Irish Research Council for Science, Engineering and Technology) and IBM, grant numbers EPSG/2011/188 and EPSPD/2011/207. Some of the experiments presented in this paper were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies (see https://​www.​grid5000.​fr) Another part of the experiments was carried out using the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia. The authors would like to thank Ashley DeFlumere for her useful comments and corrections.en
dc.publisherSpringer Science + Business Mediaen
dc.subjectBlueGeneen
dc.subjectCommunication Costen
dc.subjectExascale Computingen
dc.subjectGrid5000en
dc.subjectHierarchyen
dc.subjectMatrix Multiplicationen
dc.subjectParallel Computingen
dc.titleHierarchical approach to optimization of parallel matrix multiplication on large-scale platformsen
dc.typeArticleen
dc.identifier.journalThe Journal of Supercomputingen
dc.contributor.institutionUniversity College Dublin, Dublin, Irelanden
dc.contributor.institutionExtreme Computing R&D, , Franceen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.