Show simple item record

dc.contributor.authorCharara, Ali
dc.contributor.authorKeyes, David E.
dc.contributor.authorLtaief, Hatem
dc.date.accessioned2017-05-04T12:33:22Z
dc.date.available2017-05-04T12:33:22Z
dc.date.issued2017-03-13
dc.identifier.urihttp://hdl.handle.net/10754/623339
dc.description.abstractIn several scientific applications, like tensor contractions in deep learning computation or data compression in hierarchical low rank matrix approximation, the bulk of computation typically resides in performing thousands of independent dense linear algebra operations on very small matrix sizes (usually less than 100). Batched dense linear algebra kernels are becoming ubiquitous for such scientific computations. Within a single API call, these kernels are capable of simultaneously launching a large number of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the utilization of the underlying hardware.
dc.titleBatched Triangular DLA for Very Small Matrices on GPUs
dc.typePoster
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentExtreme Computing Research Center
dc.conference.dateMarch 13-15, 2017
dc.conference.nameHigh Performance Computing Saudi Arabia (HPC Saudi) 2017
dc.conference.locationKAUST
kaust.personCharara, Ali
kaust.personKeyes, David E.
kaust.personLtaief, Hatem
refterms.dateFOA2018-06-13T16:25:34Z


Files in this item

This item appears in the following Collection(s)

Show simple item record