High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems

dc.contributor.advisorKeyes, David E.
dc.contributor.authorAlomairy, Rabab M.
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberHadwiger, Markus
dc.contributor.committeememberLtaief, Hatem
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.date.accessioned2022-07-21T12:59:38Z
dc.date.available2022-07-21T12:59:38Z
dc.date.issued2022-07-20
dc.description.abstractTo leverage the extreme parallelism of emerging architectures, so that scientific applications can fulfill their high fidelity and multi-physics potential while sustaining high efficiency relative to the limiting resource, numerical algorithms must be redesigned. Algorithmic redesign is capable of shifting the limiting resource, for example from memory or communication to arithmetic capacity. The benefit of algorithmic redesign expands greatly when introducing a tunable tradeoff between accuracy and resources. Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and to reduce their execution time. Two “universes” of algorithmic innovations have emerged to improve computations by orders of magnitude in capacity and runtime. Each introduces a hierarchy, of rank or precision. Tile Low-Rank approximation replaces blocks of dense operator with those of low rank. Mixed precision approximation, increasingly well supported by contemporary hardware, replaces blocks of high with low precision. Herein, we design new high-performance direct solvers based on the synergism of TLR and mixed precision. Since adapting to data sparsity leads to heterogeneous workloads, we rely on task-based runtime systems to orchestrate the scheduling of fine-grained kernels onto computational resources. We first demonstrate how TLR permits to accelerate acoustic scattering and mesh deformation simulations. Our solvers outperform the state-of-art libraries by up to an order of magnitude. Then, we demonstrate the impact of enabling mixed precision in bioinformatics context. Mixed precision enhances the performance up to three-fold speedup. To facilitate the adoption of task-based runtime systems, we introduce the AL4SAN library to provide a common API for the expression and queueing of tasks across multiple dynamic runtime systems. This library handles a variety of workloads at a low overhead, while increasing user productivity. AL4SAN enables interoperability by switching runtimes at runtime, which permits to achieve a twofold speedup on a task-based generalized symmetric eigenvalue solver.
dc.identifier.citationAlomairy, R. M. (2022). High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems [KAUST Research Repository]. https://doi.org/10.25781/KAUST-F864U
dc.identifier.doi10.25781/KAUST-F864U
dc.identifier.orcid0000-0001-9911-6094
dc.identifier.urihttp://hdl.handle.net/10754/679787
dc.language.isoen
dc.person.id117653
dc.subjectMixed Precision
dc.subjectLow-Rank Approximation
dc.subjectTask-based Runtime Systems
dc.subjectScientific Applications
dc.subjectAcoustic Scattering
dc.subjectMesh Deformation
dc.subjectGenome-Wide Association Study
dc.subjectDense Linear Algebra Algorithms
dc.subjectAbstraction Layer
dc.subjectLU/Cholesky-based Solver
dc.titleHigh-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems
dc.typeDissertation
display.details.left<span><h5>Type</h5>Dissertation<br><br><h5>Authors</h5><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0001-9911-6094&spc.sf=dc.date.issued&spc.sd=DESC">Alomairy, Rabab M.</a> <a href="https://orcid.org/0000-0001-9911-6094" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><br><h5>Advisors</h5><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0002-4052-7224&spc.sf=dc.date.issued&spc.sd=DESC">Keyes, David E.</a> <a href="https://orcid.org/0000-0002-4052-7224" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><br><h5>Committee Members</h5><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0003-0085-9483&spc.sf=dc.date.issued&spc.sd=DESC">Moshkov, Mikhail</a> <a href="https://orcid.org/0000-0003-0085-9483" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0003-1239-4871&spc.sf=dc.date.issued&spc.sd=DESC">Hadwiger, Markus</a> <a href="https://orcid.org/0000-0003-1239-4871" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0002-6897-1095&spc.sf=dc.date.issued&spc.sd=DESC">Ltaief, Hatem</a> <a href="https://orcid.org/0000-0002-6897-1095" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><br><h5>Program</h5><a href="https://repository.kaust.edu.sa/search?spc.sf=dc.date.issued&spc.sd=DESC&f.program=Chemical Science,equals">Chemical Science</a><br><br><h5>KAUST Department</h5><a href="https://repository.kaust.edu.sa/search?spc.sf=dc.date.issued&spc.sd=DESC&f.department=Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division,equals">Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division</a><br><br><h5>Date</h5>2022-07-20</span>
display.details.right<span><h5>Abstract</h5>To leverage the extreme parallelism of emerging architectures, so that scientific applications can fulfill their high fidelity and multi-physics potential while sustaining high efficiency relative to the limiting resource, numerical algorithms must be redesigned. Algorithmic redesign is capable of shifting the limiting resource, for example from memory or communication to arithmetic capacity. The benefit of algorithmic redesign expands greatly when introducing a tunable tradeoff between accuracy and resources. Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and to reduce their execution time. Two “universes” of algorithmic innovations have emerged to improve computations by orders of magnitude in capacity and runtime. Each introduces a hierarchy, of rank or precision. Tile Low-Rank approximation replaces blocks of dense operator with those of low rank. Mixed precision approximation, increasingly well supported by contemporary hardware, replaces blocks of high with low precision. Herein, we design new high-performance direct solvers based on the synergism of TLR and mixed precision. Since adapting to data sparsity leads to heterogeneous workloads, we rely on task-based runtime systems to orchestrate the scheduling of fine-grained kernels onto computational resources. We first demonstrate how TLR permits to accelerate acoustic scattering and mesh deformation simulations. Our solvers outperform the state-of-art libraries by up to an order of magnitude. Then, we demonstrate the impact of enabling mixed precision in bioinformatics context. Mixed precision enhances the performance up to three-fold speedup. To facilitate the adoption of task-based runtime systems, we introduce the AL4SAN library to provide a common API for the expression and queueing of tasks across multiple dynamic runtime systems. This library handles a variety of workloads at a low overhead, while increasing user productivity. AL4SAN enables interoperability by switching runtimes at runtime, which permits to achieve a twofold speedup on a task-based generalized symmetric eigenvalue solver.<br><br><h5>Citation</h5>Alomairy, R. M. (2022). High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems [KAUST Research Repository]. https://doi.org/10.25781/KAUST-F864U<br><br><h5>DOI</h5><a href="https://doi.org/10.25781/KAUST-F864U">10.25781/KAUST-F864U</a></span>
kaust.availability.selectionRelease the work immediately for public access* on the internet through the KAUST Repository.
kaust.gpcaida.hoteit@kaust.edu.sa
kaust.request.doiyes
kaust.thesis.advisorApprovalRequestedYes, I have already submitted the final approval form to my advisor.
orcid.id0000-0002-6897-1095
orcid.id0000-0003-1239-4871
orcid.id0000-0003-0085-9483
orcid.id0000-0002-4052-7224
orcid.id0000-0001-9911-6094
refterms.dateFOA2022-07-21T12:59:40Z
thesis.degree.disciplineChemical Science
thesis.degree.grantorKing Abdullah University of Science and Technology
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2022_thesis_rabab_final.pdf
Size:
11.9 MB
Format:
Adobe Portable Document Format
Description:
PhD Dissertation
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
919 B
Format:
Item-specific license agreed upon to submission
Description: