High Productivity Programming of Dense Linear Algebra on Heterogeneous NUMA Architectures

Handle URI:
http://hdl.handle.net/10754/297194
Title:
High Productivity Programming of Dense Linear Algebra on Heterogeneous NUMA Architectures
Authors:
Alomairy, Rabab M.
Abstract:
High-end multicore systems with GPU-based accelerators are now ubiquitous in the hardware landscape. Besides dealing with the nontrivial heterogeneous environ- ment, end users should often take into consideration the underlying memory architec- ture to decrease the overhead of data motion, especially when running on non-uniform memory access (NUMA) platforms. We propose the OmpSs parallel programming model approach using its Nanos++ dynamic runtime system to solve the two challeng- ing problems aforementioned, through 1) an innovative NUMA node-aware scheduling policy to reduce data movement between NUMA nodes and 2) a nested parallelism feature to concurrently exploit the resources available from the GPU devices as well as the CPU host, without compromising the overall performance. Our approach fea- tures separation of concerns by abstracting the complexity of the hardware from the end users so that high productivity can be achieved. The Cholesky factorization is used as a benchmark representative of dense numerical linear algebra algorithms. Superior performance is also demonstrated on the symmetric matrix inversion based on Cholesky factorization, commonly used in co-variance computations in statistics. Performance on a NUMA system with Kepler-based GPUs exceeds that of existing implementations, while the OmpSs-enabled code remains very similar to its original sequential version.
Advisors:
Keyes, David E. ( 0000-0002-4052-7224 )
Committee Member:
Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Shihada, Basem ( 0000-0003-4434-4334 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jul-2013
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKeyes, David E.en
dc.contributor.authorAlomairy, Rabab M.en
dc.date.accessioned2013-08-01T08:16:46Z-
dc.date.available2013-08-01T08:16:46Z-
dc.date.issued2013-07en
dc.identifier.urihttp://hdl.handle.net/10754/297194en
dc.description.abstractHigh-end multicore systems with GPU-based accelerators are now ubiquitous in the hardware landscape. Besides dealing with the nontrivial heterogeneous environ- ment, end users should often take into consideration the underlying memory architec- ture to decrease the overhead of data motion, especially when running on non-uniform memory access (NUMA) platforms. We propose the OmpSs parallel programming model approach using its Nanos++ dynamic runtime system to solve the two challeng- ing problems aforementioned, through 1) an innovative NUMA node-aware scheduling policy to reduce data movement between NUMA nodes and 2) a nested parallelism feature to concurrently exploit the resources available from the GPU devices as well as the CPU host, without compromising the overall performance. Our approach fea- tures separation of concerns by abstracting the complexity of the hardware from the end users so that high productivity can be achieved. The Cholesky factorization is used as a benchmark representative of dense numerical linear algebra algorithms. Superior performance is also demonstrated on the symmetric matrix inversion based on Cholesky factorization, commonly used in co-variance computations in statistics. Performance on a NUMA system with Kepler-based GPUs exceeds that of existing implementations, while the OmpSs-enabled code remains very similar to its original sequential version.en
dc.language.isoenen
dc.subjectHIgh Productivityen
dc.subjectDense Linear Algebraen
dc.subjectNUMAen
dc.titleHigh Productivity Programming of Dense Linear Algebra on Heterogeneous NUMA Architecturesen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberLtaief, Hatemen
dc.contributor.committeememberMoshkov, Mikhailen
dc.contributor.committeememberShihada, Basemen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id117653en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.