Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

Handle URI:
http://hdl.handle.net/10754/625473
Title:
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Authors:
Halim Boukaram, Wajih; Turkiyyah, George; Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.
KAUST Department:
Extreme Computing Research Center
Citation:
Halim Boukaram W, Turkiyyah G, Ltaief H, Keyes DE (2017) Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression. Parallel Computing. Available: http://dx.doi.org/10.1016/j.parco.2017.09.001.
Publisher:
Elsevier BV
Journal:
Parallel Computing
Issue Date:
14-Sep-2017
DOI:
10.1016/j.parco.2017.09.001
Type:
Article
ISSN:
0167-8191
Sponsors:
The work of all four authors was supported by the Extreme Computing Research Center at the King Abdullah University of Science and Technology. We thank the NVIDIA Corporation for providing access to the P100 GPU used in this work.
Additional Links:
http://www.sciencedirect.com/science/article/pii/S0167819117301461
Appears in Collections:
Articles; Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorHalim Boukaram, Wajihen
dc.contributor.authorTurkiyyah, Georgeen
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2017-09-20T06:02:14Z-
dc.date.available2017-09-20T06:02:14Z-
dc.date.issued2017-09-14en
dc.identifier.citationHalim Boukaram W, Turkiyyah G, Ltaief H, Keyes DE (2017) Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression. Parallel Computing. Available: http://dx.doi.org/10.1016/j.parco.2017.09.001.en
dc.identifier.issn0167-8191en
dc.identifier.doi10.1016/j.parco.2017.09.001en
dc.identifier.urihttp://hdl.handle.net/10754/625473-
dc.description.abstractWe present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.en
dc.description.sponsorshipThe work of all four authors was supported by the Extreme Computing Research Center at the King Abdullah University of Science and Technology. We thank the NVIDIA Corporation for providing access to the P100 GPU used in this work.en
dc.publisherElsevier BVen
dc.relation.urlhttp://www.sciencedirect.com/science/article/pii/S0167819117301461en
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Parallel Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Parallel Computing, [, , (2017-09-14)] DOI: 10.1016/j.parco.2017.09.001 . © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectGPUen
dc.subjectQRen
dc.subjectSVDen
dc.subjectbatched operationsen
dc.subjecthierarchicalen
dc.subjectcompressionen
dc.titleBatched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compressionen
dc.typeArticleen
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalParallel Computingen
dc.eprint.versionPost-printen
dc.contributor.institutionDepartment of Computer Science, American University of Beirut (AUB), Beirut, Lebanonen
kaust.authorHalim Boukaram, Wajihen
kaust.authorLtaief, Hatemen
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.