Power profiling of Cholesky and QR factorizations on distributed memory systems

Handle URI:
http://hdl.handle.net/10754/562284
Title:
Power profiling of Cholesky and QR factorizations on distributed memory systems
Authors:
Bosilca, George; Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Dongarra, Jack
Abstract:
This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).
KAUST Department:
KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center
Publisher:
Springer Nature
Journal:
Computer Science - Research and Development
Issue Date:
30-Aug-2012
DOI:
10.1007/s00450-012-0224-2
Type:
Article
ISSN:
18652034
Appears in Collections:
Articles; KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorBosilca, Georgeen
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorDongarra, Jacken
dc.date.accessioned2015-08-03T09:59:23Zen
dc.date.available2015-08-03T09:59:23Zen
dc.date.issued2012-08-30en
dc.identifier.issn18652034en
dc.identifier.doi10.1007/s00450-012-0224-2en
dc.identifier.urihttp://hdl.handle.net/10754/562284en
dc.description.abstractThis paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).en
dc.publisherSpringer Natureen
dc.subjectDense linear algebraen
dc.subjectDistributed memory systemen
dc.subjectDynamic scheduleren
dc.subjectPower profile analysisen
dc.titlePower profiling of Cholesky and QR factorizations on distributed memory systemsen
dc.typeArticleen
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalComputer Science - Research and Developmenten
dc.contributor.institutionInnovative Computing Laboratory, University of Tennessee, Knoxville, United Statesen
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.