AuthorsAl-Harthi, Noha A.
AdvisorsKeyes, David E.
Permanent link to this recordhttp://hdl.handle.net/10754/293890
MetadataShow full item record
AbstractThe current trends in computer architecture are shifting towards smaller byte/flop ratios, while available parallelism is increasing at all levels of granularity – vector length, core count, and MPI process. Intel’s Xeon Phi coprocessor, NVIDIA’s Kepler GPU, and IBM’s BlueGene/Q all have a Byte/flop ratio close to 0.2, which makes it very difficult for most algorithms to extract a high percentage of the theoretical peak flop/s from these architectures. Popular algorithms in scientific computing such as FFT are continuously evolving to keep up with this trend in hardware. In the meantime it is also necessary to invest in novel algorithms that are more suitable for computer architectures of the future. The fast multipole method (FMM) was originally developed as a fast algorithm for ap- proximating the N-body interactions that appear in astrophysics, molecular dynamics, and vortex based fluid dynamics simulations. The FMM possesses have a unique combination of being an efficient O(N) algorithm, while having an operational intensity that is higher than a matrix-matrix multiplication. In fact, the FMM can reduce the requirement of Byte/flop to around 0.01, which means that it will remain compute bound until 2020 even if the cur- rent trend in microprocessors continues. Despite these advantages, there have not been any benchmarks of FMM codes on modern architectures such as Xeon Phi, Kepler, and Blue- Gene/Q. This study aims to provide a comprehensive benchmark of a state of the art FMM code “exaFMM” on the latest architectures, in hopes of providing a useful reference for deciding when the FMM will become useful as the computational engine in a given application code. It may also serve as a warning to certain problem size domains areas where the FMM will exhibit insignificant performance improvements. Such issues depend strongly on the asymptotic constants rather than the asymptotics themselves, and therefore are strongly implementation and hardware dependent. The primary objective of this study is to provide these constants on various computer architectures.