High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
Type
Conference PaperKAUST Department
Applied Mathematics and Computational Science ProgramComputer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Extreme Computing Research Center
Date
2015-07-25Online Publication Date
2015-07-25Print Publication Date
2015Permanent link to this record
http://hdl.handle.net/10754/565820
Metadata
Show full item recordAbstract
Leveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models of reacting flows or petroleum reservoirs, is the solution at each implicit time step of large, sparse spatially structured or unstructured linear systems. The standard method is a preconditioned Krylov solver. The Sparse Matrix-Vector multiplication (SpMV) is, in turn, one of the most time-consuming operations in such solvers. Because there is no data reuse of the elements of the matrix within a single SpMV, kernel performance is limited by the speed at which data can be transferred from memory to registers, making the bus bandwidth the major bottleneck. On the other hand, in case of a multi-species model, the resulting Jacobian has a dense block structure. For contemporary petroleum reservoir simulations, the block size typically ranges from three to a few dozen among different models, and still larger blocks are relevant within adaptively model-refined regions of the domain, though generally the size of the blocks, related to the number of conserved species, is constant over large regions within a given model. This structure can be exploited beyond the convenience of a block compressed row data format, because it offers opportunities to hide the data motion with useful computations. The new SpMV kernel outperforms existing state-of-the-art implementations on single and multi-GPUs using matrices with dense block structure representative of porous media applications with both structured and unstructured multi-component grids.Citation
Abdelfattah, Ahmad, Hatem Ltaief, and David Keyes. "High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications." In Euro-Par 2015: Parallel Processing, pp. 601-612. Springer Berlin Heidelberg, 2015Publisher
Springer NatureConference/Event name
21st International Conference on Parallel and Distributed Computing, Euro-Par 2015Additional Links
http://link.springer.com/chapter/10.1007%2F978-3-662-48096-0_46ae974a485f413a2113503eed53cd6c53
10.1007/978-3-662-48096-0_46