Optimizing memory-bound SYMV kernel on GPU hardware accelerators

Handle URI:
http://hdl.handle.net/10754/564662
Title:
Optimizing memory-bound SYMV kernel on GPU hardware accelerators
Authors:
Abdelfattah, Ahmad M.; Dongarra, Jack; Keyes, David E. ( 0000-0002-4052-7224 ) ; Ltaief, Hatem ( 0000-0002-6897-1095 )
Abstract:
Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively. © 2013 Springer-Verlag.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; KAUST Supercomputing Laboratory (KSL); Applied Mathematics and Computational Science Program; Extreme Computing Research Center
Publisher:
Springer Science + Business Media
Journal:
High Performance Computing for Computational Science - VECPAR 2012
Conference/Event name:
10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
Issue Date:
2013
DOI:
10.1007/978-3-642-38718-0_10
Type:
Conference Paper
ISSN:
03029743
ISBN:
9783642387173
Appears in Collections:
Conference Papers; Applied Mathematics and Computational Science Program; KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAbdelfattah, Ahmad M.en
dc.contributor.authorDongarra, Jacken
dc.contributor.authorKeyes, David E.en
dc.contributor.authorLtaief, Hatemen
dc.date.accessioned2015-08-04T07:11:23Zen
dc.date.available2015-08-04T07:11:23Zen
dc.date.issued2013en
dc.identifier.isbn9783642387173en
dc.identifier.issn03029743en
dc.identifier.doi10.1007/978-3-642-38718-0_10en
dc.identifier.urihttp://hdl.handle.net/10754/564662en
dc.description.abstractHardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively. © 2013 Springer-Verlag.en
dc.publisherSpringer Science + Business Mediaen
dc.titleOptimizing memory-bound SYMV kernel on GPU hardware acceleratorsen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalHigh Performance Computing for Computational Science - VECPAR 2012en
dc.conference.date17 July 2012 through 20 July 2012en
dc.conference.name10th International Conference on High Performance Computing for Computational Science, VECPAR 2012en
dc.conference.locationKobeen
dc.contributor.institutionInnovative Computing Laboratory, University of Tennessee, Knoxville, TN, United Statesen
kaust.authorAbdelfattah, Ahmad M.en
kaust.authorKeyes, David E.en
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.