Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Name:
1-s2.0-S0167819117301461-main.pdf
Size:
753.4Kb
Format:
PDF
Description:
Accepted Manuscript
Type
ArticleKAUST Department
Applied Mathematics and Computational Science ProgramComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Extreme Computing Research Center
Date
2017-09-14Permanent link to this record
http://hdl.handle.net/10754/625473
Metadata
Show full item recordAbstract
We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.Citation
Halim Boukaram W, Turkiyyah G, Ltaief H, Keyes DE (2017) Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression. Parallel Computing. Available: http://dx.doi.org/10.1016/j.parco.2017.09.001.Sponsors
The work of all four authors was supported by the Extreme Computing Research Center at the King Abdullah University of Science and Technology. We thank the NVIDIA Corporation for providing access to the P100 GPU used in this work.Publisher
Elsevier BVJournal
Parallel ComputingarXiv
1707.05141Additional Links
http://www.sciencedirect.com/science/article/pii/S0167819117301461ae974a485f413a2113503eed53cd6c53
10.1016/j.parco.2017.09.001