Show simple item record

dc.contributor.authorSukkari, Dalal E.
dc.contributor.authorLtaief, Hatem
dc.contributor.authorKeyes, David E.
dc.contributor.authorFaverge, Mathieu
dc.date.accessioned2019-12-16T13:51:52Z
dc.date.available2019-12-16T13:51:52Z
dc.date.issued2019-11-13
dc.identifier.citationSukkari, D., Ltaief, H., Keyes, D., & Faverge, M. (2019). Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems. 2019 IEEE International Conference on Cluster Computing (CLUSTER). doi:10.1109/cluster.2019.8891024
dc.identifier.doi10.1109/CLUSTER.2019.8891024
dc.identifier.urihttp://hdl.handle.net/10754/660619
dc.description.abstractThis paper describes how to leverage a task-based implementation of the polar decomposition on massively parallel systems using the PaRSEC dynamic runtime system. Based on a formulation of the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, our novel implementation reduces data traffic while exploiting high concurrency from the underlying hardware architecture. First, we replace the most time-consuming classical QR factorization phase with a new hierarchical variant, customized for the specific structure of the matrix during the QDWH iterations. The newly developed hierarchical QR for QDWH exploits not only the matrix structure, but also shortens the length of the critical path to maximize hardware occupancy. We then deploy Pa RSEC to seamlessly orchestrate, pipeline, and track the data dependencies of the various linear algebra building blocks involved during the iterative QDWH algorithm. PaRSEC enables to overlap communications with computations thanks to its asynchronous scheduling of fine-grained computational tasks. It employs look-ahead techniques to further expose parallelism, while actively pursuing the critical path. In addition, we identify synergistic opportunities between the task-based QDWH algorithm and the PaRSEC framework. We exploit them during the hierarchical QR factorization to enforce a locality-aware task execution. The latter feature permits to minimize the expensive inter-node communication, which represents one of the main bottlenecks for scaling up applications on challenging distributed-memory systems. We report numerical accuracy and performance results using well and ill-conditioned matrices. The benchmarking campaign reveals up to 2X performance speedup against the existing state-of-the-art implementation for the polar decomposition on 36, 864 cores.
dc.description.sponsorshipThe authors would like also to thank Cray Inc. and Intel in the context of the Cray Center of Excellence and Intel Parallel Computing Center awarded to the Extreme Computing Research Center at KAUST. For computer time, this research used Shaheen-2 supercomputer hosted at the Supercomputing Laboratory at KAUST, a remote system hosted by our Cray partners, and by the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Universite de Bordeaux, Bordeaux INP and Conseil Regional d’Aquitaine.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.urlhttps://ieeexplore.ieee.org/document/8891024/
dc.rightsArchived with thanks to IEEE
dc.titleLeveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems
dc.typeConference Paper
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentExtreme Computing Research Center
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentOffice of the President
dc.conference.date2019-09-23 to 2019-09-26
dc.conference.name2019 IEEE International Conference on Cluster Computing, CLUSTER 2019
dc.conference.locationAlbuquerque, NM, USA
dc.eprint.versionPost-print
dc.contributor.institutionUniv. of Bordeaux, Bordeaux INP-Inria-CNRS, Talence, 33400 France
kaust.personSukkari, Dalal E.
kaust.personLtaief, Hatem
kaust.personKeyes, David E.
refterms.dateFOA2019-12-17T06:09:24Z
kaust.acknowledged.supportUnitExtreme Computing Research Center
kaust.acknowledged.supportUnitShaheen
kaust.acknowledged.supportUnitSupercomputing Laboratory at KAUST
dc.date.published-online2019-11-13
dc.date.published-print2019-09


Files in this item

Thumbnail
Name:
qdwh-1file.pdf
Size:
823.6Kb
Format:
PDF
Description:
Accepted manuscript

This item appears in the following Collection(s)

Show simple item record