A High Performance QDWH-SVD Solver using Hardware Accelerators

Handle URI:
http://hdl.handle.net/10754/348632
Title:
A High Performance QDWH-SVD Solver using Hardware Accelerators
Authors:
Sukkari, Dalal E.; Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
This paper describes a new high performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on multicore architecture enhanced with multiple GPUs. The standard QDWH-SVD algorithm was introduced by Nakatsukasa and Higham (SIAM SISC, 2013) and combines three successive computational stages: (1) the polar decomposition calculation of the original matrix using the QDWH algorithm, (2) the symmetric eigendecomposition of the resulting polar factor to obtain the singular values and the right singular vectors and (3) the matrix-matrix multiplication to get the associated left singular vectors. A comprehensive test suite highlights the numerical robustness of the QDWH-SVD solver. Although it performs up to two times more flops when computing all singular vectors compared to the standard SVD solver algorithm, our new high performance implementation on single GPU results in up to 3.8x improvements for asymptotic matrix sizes, compared to the equivalent routines from existing state-of-the-art open-source and commercial libraries. However, when only singular values are needed, QDWH-SVD is penalized by performing up to 14 times more flops. The singular value only implementation of QDWH-SVD on single GPU can still run up to 18% faster than the best existing equivalent routines. Integrating mixed precision techniques in the solver can additionally provide up to 40% improvement at the price of losing few digits of accuracy, compared to the full double precision floating point arithmetic. We further leverage the single GPU QDWH-SVD implementation by introducing the first multi-GPU SVD solver to study the scalability of the QDWH-SVD framework.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Issue Date:
8-Apr-2015
Type:
Technical Report
Sponsors:
This work was supported by the Extreme Computing Research Center at KAUST. The authors would like to thank Ahmad Abdelfattah for his help to integrate KBLAS into QDWH-SVD and NVIDIA for the hardware donations.
Appears in Collections:
Technical Reports; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorSukkari, Dalal E.en
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2015-04-08T12:08:12Zen
dc.date.available2015-04-08T12:08:12Zen
dc.date.issued2015-04-08en
dc.identifier.urihttp://hdl.handle.net/10754/348632en
dc.description.abstractThis paper describes a new high performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on multicore architecture enhanced with multiple GPUs. The standard QDWH-SVD algorithm was introduced by Nakatsukasa and Higham (SIAM SISC, 2013) and combines three successive computational stages: (1) the polar decomposition calculation of the original matrix using the QDWH algorithm, (2) the symmetric eigendecomposition of the resulting polar factor to obtain the singular values and the right singular vectors and (3) the matrix-matrix multiplication to get the associated left singular vectors. A comprehensive test suite highlights the numerical robustness of the QDWH-SVD solver. Although it performs up to two times more flops when computing all singular vectors compared to the standard SVD solver algorithm, our new high performance implementation on single GPU results in up to 3.8x improvements for asymptotic matrix sizes, compared to the equivalent routines from existing state-of-the-art open-source and commercial libraries. However, when only singular values are needed, QDWH-SVD is penalized by performing up to 14 times more flops. The singular value only implementation of QDWH-SVD on single GPU can still run up to 18% faster than the best existing equivalent routines. Integrating mixed precision techniques in the solver can additionally provide up to 40% improvement at the price of losing few digits of accuracy, compared to the full double precision floating point arithmetic. We further leverage the single GPU QDWH-SVD implementation by introducing the first multi-GPU SVD solver to study the scalability of the QDWH-SVD framework.en
dc.description.sponsorshipThis work was supported by the Extreme Computing Research Center at KAUST. The authors would like to thank Ahmad Abdelfattah for his help to integrate KBLAS into QDWH-SVD and NVIDIA for the hardware donations.en
dc.subjectSingular Value Decompositionen
dc.subjectPolar Decompositionen
dc.subjectSymmetric Eigensolveren
dc.subjectMixed Precision Algorithmsen
dc.subjectGPU-based Scientific Computingen
dc.titleA High Performance QDWH-SVD Solver using Hardware Acceleratorsen
dc.typeTechnical Reporten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.institutionExtreme Computing Research Centeren
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.