Show simple item record

dc.contributor.authorBoukaram, Wagih Halim
dc.contributor.authorTurkiyyah, George
dc.contributor.authorLtaief, Hatem
dc.contributor.authorKeyes, David E.
dc.date.accessioned2017-09-20T06:02:14Z
dc.date.available2017-09-20T06:02:14Z
dc.date.issued2017-09-14
dc.identifier.citationHalim Boukaram W, Turkiyyah G, Ltaief H, Keyes DE (2017) Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression. Parallel Computing. Available: http://dx.doi.org/10.1016/j.parco.2017.09.001.
dc.identifier.issn0167-8191
dc.identifier.doi10.1016/j.parco.2017.09.001
dc.identifier.urihttp://hdl.handle.net/10754/625473
dc.description.abstractWe present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.
dc.description.sponsorshipThe work of all four authors was supported by the Extreme Computing Research Center at the King Abdullah University of Science and Technology. We thank the NVIDIA Corporation for providing access to the P100 GPU used in this work.
dc.publisherElsevier BV
dc.relation.urlhttp://www.sciencedirect.com/science/article/pii/S0167819117301461
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Parallel Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Parallel Computing, [, , (2017-09-14)] DOI: 10.1016/j.parco.2017.09.001 . © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectGPU
dc.subjectQR
dc.subjectSVD
dc.subjectbatched operations
dc.subjecthierarchical
dc.subjectcompression
dc.titleBatched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
dc.typeArticle
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentExtreme Computing Research Center
dc.identifier.journalParallel Computing
dc.eprint.versionPost-print
dc.contributor.institutionDepartment of Computer Science, American University of Beirut (AUB), Beirut, Lebanon
dc.identifier.arxivid1707.05141
kaust.personBoukaram, Wagih Halim
kaust.personLtaief, Hatem
kaust.personKeyes, David E.
refterms.dateFOA2019-09-14T00:00:00Z


Files in this item

Thumbnail
Name:
1-s2.0-S0167819117301461-main.pdf
Size:
753.4Kb
Format:
PDF
Description:
Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record