Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction

Handle URI:
http://hdl.handle.net/10754/575758
Title:
Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction
Authors:
Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Luszczek, Piotr R.; Dongarra, Jack
Abstract:
The objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system. © 2012 Springer-Verlag.
KAUST Department:
KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center
Publisher:
Springer Science + Business Media
Journal:
Parallel Processing and Applied Mathematics
Conference/Event name:
9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011
Issue Date:
2012
DOI:
10.1007/978-3-642-31464-3_67
Type:
Conference Paper
ISSN:
03029743
ISBN:
9783642314636
Appears in Collections:
Conference Papers; KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center; Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorLuszczek, Piotr R.en
dc.contributor.authorDongarra, Jacken
dc.date.accessioned2015-08-24T09:25:25Zen
dc.date.available2015-08-24T09:25:25Zen
dc.date.issued2012en
dc.identifier.isbn9783642314636en
dc.identifier.issn03029743en
dc.identifier.doi10.1007/978-3-642-31464-3_67en
dc.identifier.urihttp://hdl.handle.net/10754/575758en
dc.description.abstractThe objective of this paper is to enhance the parallelism of the tile bidiagonal transformation using tree reduction on multicore architectures. First introduced by Ltaief et. al [LAPACK Working Note #247, 2011], the bidiagonal transformation using tile algorithms with a two-stage approach has shown very promising results on square matrices. However, for tall and skinny matrices, the inherent problem of processing the panel in a domino-like fashion generates unnecessary sequential tasks. By using tree reduction, the panel is horizontally split, which creates another dimension of parallelism and engenders many concurrent tasks to be dynamically scheduled on the available cores. The results reported in this paper are very encouraging. The new tile bidiagonal transformation, targeting tall and skinny matrices, outperforms the state-of-the-art numerical linear algebra libraries LAPACK V3.2 and Intel MKL ver. 10.3 by up to 29-fold speedup and the standard two-stage PLASMA BRD by up to 20-fold speedup, on an eight socket hexa-core AMD Opteron multicore shared-memory system. © 2012 Springer-Verlag.en
dc.publisherSpringer Science + Business Mediaen
dc.subjectBidiagonal Transformationen
dc.subjectDynamic Schedulingen
dc.subjectHigh Performance Computingen
dc.subjectMulticore Architectureen
dc.subjectTree Reductionen
dc.titleEnhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reductionen
dc.typeConference Paperen
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalParallel Processing and Applied Mathematicsen
dc.conference.date11 September 2011 through 14 September 2011en
dc.conference.name9th International Conference on Parallel Processing and Applied Mathematics, PPAM 2011en
dc.conference.locationTorunen
dc.contributor.institutionUniversity of Tennessee, Knoxville, TN, United Statesen
dc.contributor.institutionOak Ridge National Laboratory, United Statesen
dc.contributor.institutionUniversity of Manchester, United Kingdomen
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.