Performance Benchmarking of Fast Multipole Methods

Handle URI:
http://hdl.handle.net/10754/293890
Title:
Performance Benchmarking of Fast Multipole Methods
Authors:
Al-Harthi, Noha A.
Abstract:
The current trends in computer architecture are shifting towards smaller byte/flop ratios, while available parallelism is increasing at all levels of granularity – vector length, core count, and MPI process. Intel’s Xeon Phi coprocessor, NVIDIA’s Kepler GPU, and IBM’s BlueGene/Q all have a Byte/flop ratio close to 0.2, which makes it very difficult for most algorithms to extract a high percentage of the theoretical peak flop/s from these architectures. Popular algorithms in scientific computing such as FFT are continuously evolving to keep up with this trend in hardware. In the meantime it is also necessary to invest in novel algorithms that are more suitable for computer architectures of the future. The fast multipole method (FMM) was originally developed as a fast algorithm for ap- proximating the N-body interactions that appear in astrophysics, molecular dynamics, and vortex based fluid dynamics simulations. The FMM possesses have a unique combination of being an efficient O(N) algorithm, while having an operational intensity that is higher than a matrix-matrix multiplication. In fact, the FMM can reduce the requirement of Byte/flop to around 0.01, which means that it will remain compute bound until 2020 even if the cur- rent trend in microprocessors continues. Despite these advantages, there have not been any benchmarks of FMM codes on modern architectures such as Xeon Phi, Kepler, and Blue- Gene/Q. This study aims to provide a comprehensive benchmark of a state of the art FMM code “exaFMM” on the latest architectures, in hopes of providing a useful reference for deciding when the FMM will become useful as the computational engine in a given application code. It may also serve as a warning to certain problem size domains areas where the FMM will exhibit insignificant performance improvements. Such issues depend strongly on the asymptotic constants rather than the asymptotics themselves, and therefore are strongly implementation and hardware dependent. The primary objective of this study is to provide these constants on various computer architectures.
Advisors:
Keyes, David E. ( 0000-0002-4052-7224 )
Committee Member:
Bagci, Hakan ( 0000-0003-3867-5786 ) ; Ravasi, Timothy ( 0000-0002-9950-465X )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jun-2013
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKeyes, David E.en
dc.contributor.authorAl-Harthi, Noha A.en
dc.date.accessioned2013-06-12T13:40:05Z-
dc.date.available2013-06-12T13:40:05Z-
dc.date.issued2013-06en
dc.identifier.urihttp://hdl.handle.net/10754/293890en
dc.description.abstractThe current trends in computer architecture are shifting towards smaller byte/flop ratios, while available parallelism is increasing at all levels of granularity – vector length, core count, and MPI process. Intel’s Xeon Phi coprocessor, NVIDIA’s Kepler GPU, and IBM’s BlueGene/Q all have a Byte/flop ratio close to 0.2, which makes it very difficult for most algorithms to extract a high percentage of the theoretical peak flop/s from these architectures. Popular algorithms in scientific computing such as FFT are continuously evolving to keep up with this trend in hardware. In the meantime it is also necessary to invest in novel algorithms that are more suitable for computer architectures of the future. The fast multipole method (FMM) was originally developed as a fast algorithm for ap- proximating the N-body interactions that appear in astrophysics, molecular dynamics, and vortex based fluid dynamics simulations. The FMM possesses have a unique combination of being an efficient O(N) algorithm, while having an operational intensity that is higher than a matrix-matrix multiplication. In fact, the FMM can reduce the requirement of Byte/flop to around 0.01, which means that it will remain compute bound until 2020 even if the cur- rent trend in microprocessors continues. Despite these advantages, there have not been any benchmarks of FMM codes on modern architectures such as Xeon Phi, Kepler, and Blue- Gene/Q. This study aims to provide a comprehensive benchmark of a state of the art FMM code “exaFMM” on the latest architectures, in hopes of providing a useful reference for deciding when the FMM will become useful as the computational engine in a given application code. It may also serve as a warning to certain problem size domains areas where the FMM will exhibit insignificant performance improvements. Such issues depend strongly on the asymptotic constants rather than the asymptotics themselves, and therefore are strongly implementation and hardware dependent. The primary objective of this study is to provide these constants on various computer architectures.en
dc.language.isoenen
dc.subjectFast Multiple Methoden
dc.subjectFMMen
dc.subjectBenchmarken
dc.subjectScalabilityen
dc.subjectload balancingen
dc.subjectGPUen
dc.titlePerformance Benchmarking of Fast Multipole Methodsen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberBagci, Hakanen
dc.contributor.committeememberRavasi, Timothyen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id118367en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.