A performance model for the communication in fast multipole methods on high-performance computing platforms

Handle URI:
http://hdl.handle.net/10754/622506
Title:
A performance model for the communication in fast multipole methods on high-performance computing platforms
Authors:
Ibeid, Huda ( 0000-0001-5208-5366 ) ; Yokota, Rio ( 0000-0001-7573-7873 ) ; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Ibeid H, Yokota R, Keyes D (2016) A performance model for the communication in fast multipole methods on high-performance computing platforms. International Journal of High Performance Computing Applications 30: 423–437. Available: http://dx.doi.org/10.1177/1094342016634819.
Publisher:
SAGE Publications
Journal:
International Journal of High Performance Computing Applications
Issue Date:
4-Mar-2016
DOI:
10.1177/1094342016634819
Type:
Article
ISSN:
1094-3420; 1741-2846
Sponsors:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We acknowledge system access and the generous assistance of the staff at four facilities for the performance tests herein: the KAUST Supercomputing Laboratory; the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357; the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC05-00OR22725; and the Swiss National Supercomputing Centre (CSCS), under project ID g81.
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorIbeid, Hudaen
dc.contributor.authorYokota, Rioen
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2017-01-02T09:55:27Z-
dc.date.available2017-01-02T09:55:27Z-
dc.date.issued2016-03-04en
dc.identifier.citationIbeid H, Yokota R, Keyes D (2016) A performance model for the communication in fast multipole methods on high-performance computing platforms. International Journal of High Performance Computing Applications 30: 423–437. Available: http://dx.doi.org/10.1177/1094342016634819.en
dc.identifier.issn1094-3420en
dc.identifier.issn1741-2846en
dc.identifier.doi10.1177/1094342016634819en
dc.identifier.urihttp://hdl.handle.net/10754/622506-
dc.description.abstractExascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.en
dc.description.sponsorshipThe author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We acknowledge system access and the generous assistance of the staff at four facilities for the performance tests herein: the KAUST Supercomputing Laboratory; the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357; the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC05-00OR22725; and the Swiss National Supercomputing Centre (CSCS), under project ID g81.en
dc.publisherSAGE Publicationsen
dc.subjectcommunication complexityen
dc.subjectcommunication performance modelsen
dc.subjectfast multipole methoden
dc.titleA performance model for the communication in fast multipole methods on high-performance computing platformsen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalInternational Journal of High Performance Computing Applicationsen
kaust.authorIbeid, Hudaen
kaust.authorYokota, Rioen
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.