Enabling High Performance Large Scale Dense Problems through KBLAS

Handle URI:
http://hdl.handle.net/10754/624931
Title:
Enabling High Performance Large Scale Dense Problems through KBLAS
Authors:
Abdelfattah, Ahmad ( 0000-0001-5054-4784 ) ; Keyes, David E. ( 0000-0002-4052-7224 ) ; Ltaief, Hatem ( 0000-0002-6897-1095 )
Abstract:
KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.
KAUST Department:
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Conference/Event name:
SHAXC-2 Workshop 2014
Issue Date:
4-May-2014
Type:
Poster
Appears in Collections:
Posters; Scalable Hierarchical Algorithms for eXtreme Computing (SHAXC-2) Workshop 2014

Full metadata record

DC FieldValue Language
dc.contributor.authorAbdelfattah, Ahmaden
dc.contributor.authorKeyes, David E.en
dc.contributor.authorLtaief, Hatemen
dc.date.accessioned2017-06-12T10:24:00Z-
dc.date.available2017-06-12T10:24:00Z-
dc.date.issued2014-05-04-
dc.identifier.urihttp://hdl.handle.net/10754/624931-
dc.description.abstractKBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.en
dc.titleEnabling High Performance Large Scale Dense Problems through KBLASen
dc.typePosteren
dc.contributor.departmentComputer, Electrical and Mathematical Sciences & Engineering (CEMSE)en
dc.conference.dateMay 4-6, 2014en
dc.conference.nameSHAXC-2 Workshop 2014en
dc.conference.locationKAUSTen
kaust.authorAbdelfattah, Ahmaden
kaust.authorKeyes, David E.en
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.