High performance matrix inversion based on LU factorization for multicore architectures

Handle URI:
http://hdl.handle.net/10754/575750
Title:
High performance matrix inversion based on LU factorization for multicore architectures
Authors:
Dongarra, Jack; Faverge, Mathieu; Ltaief, Hatem ( 0000-0002-6897-1095 ) ; Luszczek, Piotr R.
Abstract:
The goal of this paper is to present an efficient implementation of an explicit matrix inversion of general square matrices on multicore computer architecture. The inversion procedure is split into four steps: 1) computing the LU factorization, 2) inverting the upper triangular U factor, 3) solving a linear system, whose solution yields inverse of the original matrix and 4) applying backward column pivoting on the inverted matrix. Using a tile data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the four steps is decomposed into computational tasks. A directed acyclic graph is generated on the fly which represents the program data flow. Its nodes represent tasks and edges the data dependencies between them. Previous implementations of matrix inversions, available in the state-of-the-art numerical libraries, are suffer from unnecessary synchronization points, which are non-existent in our implementation in order to fully exploit the parallelism of the underlying hardware. Our algorithmic approach allows to remove these bottlenecks and to execute the tasks with loose synchronization. A runtime environment system called QUARK is necessary to dynamically schedule our numerical kernels on the available processing units. The reported results from our LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK (5x), MKL (5x) and ScaLAPACK (2.5x) on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000. A power consumption analysis shows that our high performance implementation is also energy efficient and substantially consumes less power than its competitors. © 2011 ACM.
KAUST Department:
KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center
Publisher:
Association for Computing Machinery (ACM)
Journal:
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers - MTAGS '11
Conference/Event name:
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Issue Date:
2011
DOI:
10.1145/2132876.2132885
Type:
Conference Paper
ISBN:
9781450311458
Appears in Collections:
Conference Papers; KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center; Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorDongarra, Jacken
dc.contributor.authorFaverge, Mathieuen
dc.contributor.authorLtaief, Hatemen
dc.contributor.authorLuszczek, Piotr R.en
dc.date.accessioned2015-08-24T09:25:12Zen
dc.date.available2015-08-24T09:25:12Zen
dc.date.issued2011en
dc.identifier.isbn9781450311458en
dc.identifier.doi10.1145/2132876.2132885en
dc.identifier.urihttp://hdl.handle.net/10754/575750en
dc.description.abstractThe goal of this paper is to present an efficient implementation of an explicit matrix inversion of general square matrices on multicore computer architecture. The inversion procedure is split into four steps: 1) computing the LU factorization, 2) inverting the upper triangular U factor, 3) solving a linear system, whose solution yields inverse of the original matrix and 4) applying backward column pivoting on the inverted matrix. Using a tile data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the four steps is decomposed into computational tasks. A directed acyclic graph is generated on the fly which represents the program data flow. Its nodes represent tasks and edges the data dependencies between them. Previous implementations of matrix inversions, available in the state-of-the-art numerical libraries, are suffer from unnecessary synchronization points, which are non-existent in our implementation in order to fully exploit the parallelism of the underlying hardware. Our algorithmic approach allows to remove these bottlenecks and to execute the tasks with loose synchronization. A runtime environment system called QUARK is necessary to dynamically schedule our numerical kernels on the available processing units. The reported results from our LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK (5x), MKL (5x) and ScaLAPACK (2.5x) on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000. A power consumption analysis shows that our high performance implementation is also energy efficient and substantially consumes less power than its competitors. © 2011 ACM.en
dc.publisherAssociation for Computing Machinery (ACM)en
dc.subjectLU factorizationen
dc.subjectmulticore parallel performanceen
dc.subjectruntime DAG schedulingen
dc.titleHigh performance matrix inversion based on LU factorization for multicore architecturesen
dc.typeConference Paperen
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalProceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers - MTAGS '11en
dc.conference.dateNovember 14th, 2011en
dc.conference.nameProceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputersen
dc.conference.locationSeattle Washingtonen
dc.contributor.institutionUniversity of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, United Statesen
dc.contributor.institutionComputer Science and Mathematics Division, Oak Ridge National Laboratory, United Statesen
dc.contributor.institutionSchool of Mathematics, School of Computer Science, University of Manchester, United Kingdomen
kaust.authorLtaief, Hatemen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.