High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications

Handle URI:
http://hdl.handle.net/10754/565820
Title:
High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications
Authors:
Abdelfattah, Ahmad ( 0000-0001-5054-4784 ) ; Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
Leveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models of reacting flows or petroleum reservoirs, is the solution at each implicit time step of large, sparse spatially structured or unstructured linear systems. The standard method is a preconditioned Krylov solver. The Sparse Matrix-Vector multiplication (SpMV) is, in turn, one of the most time-consuming operations in such solvers. Because there is no data reuse of the elements of the matrix within a single SpMV, kernel performance is limited by the speed at which data can be transferred from memory to registers, making the bus bandwidth the major bottleneck. On the other hand, in case of a multi-species model, the resulting Jacobian has a dense block structure. For contemporary petroleum reservoir simulations, the block size typically ranges from three to a few dozen among different models, and still larger blocks are relevant within adaptively model-refined regions of the domain, though generally the size of the blocks, related to the number of conserved species, is constant over large regions within a given model. This structure can be exploited beyond the convenience of a block compressed row data format, because it offers opportunities to hide the data motion with useful computations. The new SpMV kernel outperforms existing state-of-the-art implementations on single and multi-GPUs using matrices with dense block structure representative of porous media applications with both structured and unstructured multi-component grids.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Extreme Computing Research Center
Citation:
Abdelfattah, Ahmad, Hatem Ltaief, and David Keyes. "High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications." In Euro-Par 2015: Parallel Processing, pp. 601-612. Springer Berlin Heidelberg, 2015
Publisher:
Springer Science + Business Media
Journal:
Euro-Par 2015: Parallel Processing
Conference/Event name:
21st International Conference on Parallel and Distributed Computing, Euro-Par 2015
Issue Date:
25-Jul-2015
DOI:
10.1007/978-3-662-48096-0_46
Type:
Conference Paper
ISSN:
0302-9743
Additional Links:
http://link.springer.com/chapter/10.1007%2F978-3-662-48096-0_46
Appears in Collections:
Conference Papers; Extreme Computing Research Center; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAbdelfattah, Ahmaden
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2015-08-11T10:14:54Zen
dc.date.available2015-08-11T10:14:54Zen
dc.date.issued2015-07-25en
dc.identifier.citationAbdelfattah, Ahmad, Hatem Ltaief, and David Keyes. "High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications." In Euro-Par 2015: Parallel Processing, pp. 601-612. Springer Berlin Heidelberg, 2015en
dc.identifier.issn0302-9743en
dc.identifier.doi10.1007/978-3-662-48096-0_46en
dc.identifier.urihttp://hdl.handle.net/10754/565820en
dc.description.abstractLeveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models of reacting flows or petroleum reservoirs, is the solution at each implicit time step of large, sparse spatially structured or unstructured linear systems. The standard method is a preconditioned Krylov solver. The Sparse Matrix-Vector multiplication (SpMV) is, in turn, one of the most time-consuming operations in such solvers. Because there is no data reuse of the elements of the matrix within a single SpMV, kernel performance is limited by the speed at which data can be transferred from memory to registers, making the bus bandwidth the major bottleneck. On the other hand, in case of a multi-species model, the resulting Jacobian has a dense block structure. For contemporary petroleum reservoir simulations, the block size typically ranges from three to a few dozen among different models, and still larger blocks are relevant within adaptively model-refined regions of the domain, though generally the size of the blocks, related to the number of conserved species, is constant over large regions within a given model. This structure can be exploited beyond the convenience of a block compressed row data format, because it offers opportunities to hide the data motion with useful computations. The new SpMV kernel outperforms existing state-of-the-art implementations on single and multi-GPUs using matrices with dense block structure representative of porous media applications with both structured and unstructured multi-component grids.en
dc.language.isoenen
dc.publisherSpringer Science + Business Mediaen
dc.relation.urlhttp://link.springer.com/chapter/10.1007%2F978-3-662-48096-0_46en
dc.rightsThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-662-48096-0_46en
dc.titleHigh Performance Multi-GPU SpMV for Multi-component PDE-Based Applicationsen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalEuro-Par 2015: Parallel Processingen
dc.conference.date2015-08-24 to 2015-08-28en
dc.conference.name21st International Conference on Parallel and Distributed Computing, Euro-Par 2015en
dc.conference.locationVienna, AUTen
dc.eprint.versionPost-printen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorAbdelfattah, Ahmaden
kaust.authorLtaief, Hatemen
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.