Solving the generalized symmetric eigenvalue problem using tile algorithms on multicore architectures

Handle URI:
http://hdl.handle.net/10754/575787
Title:
Solving the generalized symmetric eigenvalue problem using tile algorithms on multicore architectures
Authors:
Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Luszczek, Piotr R.; Haidar, Azzam; Dongarra, Jack
Abstract:
This paper proposes an efficient implementation of the generalized symmetric eigenvalue problem on multicore architecture. Based on a four-stage approach and tile algorithms, the original problem is first transformed into a standard symmetric eigenvalue problem by computing the Cholesky factorization of the right hand side symmetric definite positive matrix (first stage), and applying the inverse of the freshly computed triangular Cholesky factors to the original dense symmetric matrix of the problem (second stage). Calculating the eigenpairs of the resulting problem is then equivalent to the eigenpairs of the original problem. The computation proceeds by reducing the updated dense symmetric matrix to symmetric band form (third stage). The band structure is further reduced by applying a bulge chasing procedure, which annihilates the extra off-diagonal entries using orthogonal transformations (fourth stage). More details on the third and fourth stage can be found in Haidar et al. [Accepted at SC'11, November 2011]. The eigenvalues are then calculated from the tridiagonal form using the standard LAPACK QR algorithm (i.e., DTSEQR routine), while the complex and challenging eigenvector computations will be addressed in a companion paper. The tasks from the various stages can concurrently run in an out-of-order fashion. The data dependencies are cautiously tracked by the dynamic runtime system environment QUARK, which ensures the dependencies are not violated for numerical correctness purposes. The obtained tile four-stage generalized symmetric eigenvalue solver significantly outperforms the state-of-the-art numerical libraries (up to 21-fold speed up against multithreaded LAPACK with optimized multithreaded MKL BLAS and up to 4-fold speed up against the corresponding routine from the commercial numerical software Intel MKL) on four sockets twelve cores AMD system with a 24000×24000 matrix size. © 2012 The authors and IOS Press. All rights reserved.
KAUST Department:
KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center
Journal:
Advances in Parallel Computing
Issue Date:
1-Jan-2012
DOI:
10.3233/978-1-61499-041-3-397
Type:
Book Chapter
ISSN:
09275452
ISBN:
9781614990406
Appears in Collections:
KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center; Extreme Computing Research Center; Book Chapters

Full metadata record

DC FieldValue Language
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorLuszczek, Piotr R.en
dc.contributor.authorHaidar, Azzamen
dc.contributor.authorDongarra, Jacken
dc.date.accessioned2015-08-24T09:26:11Zen
dc.date.available2015-08-24T09:26:11Zen
dc.date.issued2012-01-01en
dc.identifier.isbn9781614990406en
dc.identifier.issn09275452en
dc.identifier.doi10.3233/978-1-61499-041-3-397en
dc.identifier.urihttp://hdl.handle.net/10754/575787en
dc.description.abstractThis paper proposes an efficient implementation of the generalized symmetric eigenvalue problem on multicore architecture. Based on a four-stage approach and tile algorithms, the original problem is first transformed into a standard symmetric eigenvalue problem by computing the Cholesky factorization of the right hand side symmetric definite positive matrix (first stage), and applying the inverse of the freshly computed triangular Cholesky factors to the original dense symmetric matrix of the problem (second stage). Calculating the eigenpairs of the resulting problem is then equivalent to the eigenpairs of the original problem. The computation proceeds by reducing the updated dense symmetric matrix to symmetric band form (third stage). The band structure is further reduced by applying a bulge chasing procedure, which annihilates the extra off-diagonal entries using orthogonal transformations (fourth stage). More details on the third and fourth stage can be found in Haidar et al. [Accepted at SC'11, November 2011]. The eigenvalues are then calculated from the tridiagonal form using the standard LAPACK QR algorithm (i.e., DTSEQR routine), while the complex and challenging eigenvector computations will be addressed in a companion paper. The tasks from the various stages can concurrently run in an out-of-order fashion. The data dependencies are cautiously tracked by the dynamic runtime system environment QUARK, which ensures the dependencies are not violated for numerical correctness purposes. The obtained tile four-stage generalized symmetric eigenvalue solver significantly outperforms the state-of-the-art numerical libraries (up to 21-fold speed up against multithreaded LAPACK with optimized multithreaded MKL BLAS and up to 4-fold speed up against the corresponding routine from the commercial numerical software Intel MKL) on four sockets twelve cores AMD system with a 24000×24000 matrix size. © 2012 The authors and IOS Press. All rights reserved.en
dc.subjectBulge Chasingen
dc.subjectDynamic Scheduling for Multicore Systemsen
dc.subjectGeneralized Symmetric Eigenvalue Problemen
dc.subjectTile Algorithmsen
dc.subjectTridiagonal Reductionen
dc.titleSolving the generalized symmetric eigenvalue problem using tile algorithms on multicore architecturesen
dc.typeBook Chapteren
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalAdvances in Parallel Computingen
dc.contributor.institutionInnovative Computing Laboratory, University of Tennessee, Knoxville TN, United Statesen
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.