Show simple item record

dc.contributor.authorAbdelfattah, Ahmad
dc.contributor.authorDongarra, Jack
dc.contributor.authorKeyes, David E.
dc.contributor.authorLtaief, Hatem
dc.date.accessioned2015-08-04T07:11:23Z
dc.date.available2015-08-04T07:11:23Z
dc.date.issued2013
dc.identifier.isbn9783642387173
dc.identifier.issn03029743
dc.identifier.doi10.1007/978-3-642-38718-0_10
dc.identifier.urihttp://hdl.handle.net/10754/564662
dc.description.abstractHardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively. © 2013 Springer-Verlag.
dc.publisherSpringer Nature
dc.titleOptimizing memory-bound SYMV kernel on GPU hardware accelerators
dc.typeConference Paper
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentExtreme Computing Research Center
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)
dc.identifier.journalHigh Performance Computing for Computational Science - VECPAR 2012
dc.conference.date17 July 2012 through 20 July 2012
dc.conference.name10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
dc.conference.locationKobe
dc.contributor.institutionInnovative Computing Laboratory, University of Tennessee, Knoxville, TN, United States
kaust.personAbdelfattah, Ahmad
kaust.personKeyes, David E.
kaust.personLtaief, Hatem


This item appears in the following Collection(s)

Show simple item record