High Performance Polar Decomposition on Manycore Systems and its application to Symmetric Eigensolvers and the Singular Value Decomposition
AdvisorsKeyes, David E.
Permanent link to this recordhttp://hdl.handle.net/10754/652466
MetadataShow full item record
AbstractThe Polar Decomposition (PD) of a dense matrix is an important operation in linear algebra, while being a building block for solving the Symmetric Eigenvalue Problem (SEP) and computing the Singular Value Decomposition (SVD). It can be directly calculated through the SVD itself, or iteratively using the QR Dynamically-Weighted Halley (QDWH) algorithm. The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. The latter is an iterative method, which performs more floating-point operations than the SVD approach, but exposes at the same time more parallelism. Looking at the roadmap of the hardware technology scaling, algorithms perform- ing floating-point operations on locally cached data should be favored over those requiring expensive horizontal data movement. In this context, this thesis investigates new high-performance algorithmic designs of QDWH algorithm to compute the PD. Originally introduced by Nakatsukasa et al. [1, 2], our algorithmic contributions include mixed precision techniques, task-based formulations, and parallel asynchronous executions. Moreover, by making the PD competitive, its application to the SEP and the SVD becomes practical. In particular, we introduce for the first time new algorithms for partial SVD decomposition using QDWH. By the same token, we extend the QDWH to support partial eigen decomposition for SEP. We present new high-performance implementations of the QDWH-based algorithms relying on fine-grained computations, which allows exploiting the sparsity of the underlying data structure. To demonstrate performance efficiency, portability and scalability, we conduct benchmarking campaigns on some of the latest shared/distributed-memory systems. Our QDWH-based algorithm implementations outperform the state-of-the-art numerical libraries by up to 2.8x and 12x on shared and distributed-memory, respectively. The task-based QDWH has been integrated into the Chameleon library (https://gitlab.inria.fr/solverstack/chameleon) for support on shared-memory systems with hardware accelerators. It is also currently being used by astronomers from the Subaru telescope located at the summit of Mauna Kea, Hawaii, USA. The distributed-memory software library of QDWH and its SVD extension are freely available under modified-BSD license at https: //github.com/ecrc/qdwh.git and https://github.com/ecrc/ksvd.git, respectively. Both software libraries have been integrated into the Cray Scientific numerical library LibSci v17.11.1 and v19.02.1.