Type
Conference PaperKAUST Department
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionExtreme Computing Research Center
Date
2012-11Permanent link to this record
http://hdl.handle.net/10754/564624
Metadata
Show full item recordAbstract
This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in fact contains operations that cannot be readily expressed in such conventional but restrictive models. We show that task parallelism, or parallel recursions in particular, allows us to parallelize all operations of FMM naturally and scalably. Moreover it allows us to parallelize a ''mutual interaction'' for force/potential evaluation, which is roughly twice as efficient as a more conventional, unidirectional force/potential evaluation. The net result is an open source FMM that is clearly among the fastest single node implementations, including those on GPUs; with a million particles on a 32 cores Sandy Bridge 2.20GHz node, it completes a single time step including tree construction and force/potential evaluation in 65 milliseconds. The study clearly showcases both programmability and performance benefits of flexible parallel constructs over more monolithic parallel loops. © 2012 IEEE.Citation
Taura, K., Nakashima, J., Yokota, R., & Maruyama, N. (2012). A Task Parallel Implementation of Fast Multipole Methods. 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. doi:10.1109/sc.companion.2012.86Conference/Event name
2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012ISBN
9780769549569ae974a485f413a2113503eed53cd6c53
10.1109/SC.Companion.2012.86