Show simple item record

dc.contributor.authorLtaief, Hatem
dc.contributor.authorSukkari, Dalal E.
dc.contributor.authorEsposito, Aniello
dc.contributor.authorNakatsukasa, Yuji
dc.contributor.authorKeyes, David E.
dc.identifier.citationLtaief, H., Sukkari, D., Esposito, A., Nakatsukasa, Y., & Keyes, D. (2019). Massively Parallel Polar Decomposition on Distributed-memory Systems. ACM Transactions on Parallel Computing, 6(1), 1–15. doi:10.1145/3328723
dc.description.abstractWe present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions-introduced by Zolotarev (ZOLO) in 1877-this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to 17, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102,400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3× speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.
dc.description.sponsorshipThe authors thank Cray Inc. and Intel Corp. in the context of the Cray Center of Excellence and Intel Parallel Computing Center awarded to the Extreme Computing Research Center (ECRC) at KAUST. The authors also thank Mustafa Abduljabbar from ECRC for his help to further enhance the general features of the code. For computer time, this research used Shaheen supercomputer hosted at the Supercomputing Laboratory at King Abdullah University of Science and Technology (KAUST).
dc.publisherAssociation for Computing Machinery (ACM)
dc.rights© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Parallel Computing, {[Volume], [Issue], (2019-05-01)}
dc.subjectPolar Decomposition
dc.subjectZolotarev Functions
dc.subjectParallel Algorithms
dc.subjectStrong Scaling
dc.subjectDistributed-Memory Systems
dc.titleMassively Parallel Polar Decomposition on Distributed-memory Systems
dc.contributor.departmentApplied Mathematics and Computational Science
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentExtreme Computing Research Center
dc.identifier.journalACM Transactions on Parallel Computing
dc.contributor.institutionCray EMEA Research Lab, Bristol, UK
dc.contributor.institutionMathematical Institute, University of Oxford, Oxford, UK
kaust.personLtaief, Hatem
kaust.personSukkari, Dalal E.
kaust.personKeyes, David E.
kaust.acknowledged.supportUnitExtreme Computing Research Center
kaust.acknowledged.supportUnitSupercomputing Laboratory

Files in this item

Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record


*Selected version