Asynchronous Task-Based Parallelization of Algebraic Multigrid

Handle URI:
http://hdl.handle.net/10754/625632
Title:
Asynchronous Task-Based Parallelization of Algebraic Multigrid
Authors:
AlOnazi, Amani A.; Markomanolis, George S.; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today's petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.
KAUST Department:
Extreme Computing Research Center; KAUST Supercomputing Laboratory (KSL)
Citation:
AlOnazi A, Markomanolis GS, Keyes D (2017) Asynchronous Task-Based Parallelization of Algebraic Multigrid. Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC ’17. Available: http://dx.doi.org/10.1145/3093172.3093230.
Publisher:
ACM Press
Journal:
Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '17
Conference/Event name:
Platform for Advanced Scientific Computing Conference, PASC 2017
Issue Date:
23-Jun-2017
DOI:
10.1145/3093172.3093230
Type:
Conference Paper
Sponsors:
We thank Hatem Ltaief, Stefano Zampini, and Lisandro Dalcin of the Extreme Computing Research Center at KAUST for their help. We also thank Ulrike Yang from Lawrence Livermore National Laboratory for her useful comments. For performance tests on the Shaheen II Cray XC40 supercomputer we gratefully acknowledge the KAUST Supercomputing Laboratory.
Additional Links:
http://dl.acm.org/citation.cfm?doid=3093172.3093230
Appears in Collections:
Conference Papers; KAUST Supercomputing Laboratory (KSL); Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorAlOnazi, Amani A.en
dc.contributor.authorMarkomanolis, George S.en
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2017-10-03T12:49:30Z-
dc.date.available2017-10-03T12:49:30Z-
dc.date.issued2017-06-23en
dc.identifier.citationAlOnazi A, Markomanolis GS, Keyes D (2017) Asynchronous Task-Based Parallelization of Algebraic Multigrid. Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC ’17. Available: http://dx.doi.org/10.1145/3093172.3093230.en
dc.identifier.doi10.1145/3093172.3093230en
dc.identifier.urihttp://hdl.handle.net/10754/625632-
dc.description.abstractAs processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today's petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.en
dc.description.sponsorshipWe thank Hatem Ltaief, Stefano Zampini, and Lisandro Dalcin of the Extreme Computing Research Center at KAUST for their help. We also thank Ulrike Yang from Lawrence Livermore National Laboratory for her useful comments. For performance tests on the Shaheen II Cray XC40 supercomputer we gratefully acknowledge the KAUST Supercomputing Laboratory.en
dc.publisherACM Pressen
dc.relation.urlhttp://dl.acm.org/citation.cfm?doid=3093172.3093230en
dc.subjectAdditive multigriden
dc.subjectHybrid mpi ompss implementationen
dc.subjectMultigriden
dc.subjectTask-based parallelismen
dc.titleAsynchronous Task-Based Parallelization of Algebraic Multigriden
dc.typeConference Paperen
dc.contributor.departmentExtreme Computing Research Centeren
dc.contributor.departmentKAUST Supercomputing Laboratory (KSL)en
dc.identifier.journalProceedings of the Platform for Advanced Scientific Computing Conference on - PASC '17en
dc.conference.date2017-06-26 to 2017-06-28en
dc.conference.namePlatform for Advanced Scientific Computing Conference, PASC 2017en
dc.conference.locationLugano, CHEen
kaust.authorAlOnazi, Amani A.en
kaust.authorMarkomanolis, George S.en
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.