An FMM based on dual tree traversal for many-core architectures

Handle URI:
http://hdl.handle.net/10754/562937
Title:
An FMM based on dual tree traversal for many-core architectures
Authors:
Yokota, Rio ( 0000-0001-7573-7873 )
Abstract:
The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N-body codes demonstrates that orders of magnitude increase in performance can be achieved by careful selection of the optimal algorithm and low-level optimization of the code. The current N-body solver uses a fast multipole method with an efficient strategy for finding the list of cell-cell interactions by a dual tree traversal. A task-based threading model is used to maximize thread-level parallelism and intra-node load-balancing. In order to extract the full potential of the SIMD units on the latest CPUs, the inner kernels are optimized using AVX instructions.
KAUST Department:
Extreme Computing Research Center
Publisher:
SAGE Publications
Journal:
Journal of Algorithms & Computational Technology
Issue Date:
Sep-2013
DOI:
10.1260/1748-3018.7.3.301
ARXIV:
arXiv:1209.3516
Type:
Article
ISSN:
17483018
Additional Links:
http://arxiv.org/abs/arXiv:1209.3516v3
Appears in Collections:
Articles; Extreme Computing Research Center

Full metadata record

DC FieldValue Language
dc.contributor.authorYokota, Rioen
dc.date.accessioned2015-08-03T11:16:14Zen
dc.date.available2015-08-03T11:16:14Zen
dc.date.issued2013-09en
dc.identifier.issn17483018en
dc.identifier.doi10.1260/1748-3018.7.3.301en
dc.identifier.urihttp://hdl.handle.net/10754/562937en
dc.description.abstractThe present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N-body codes demonstrates that orders of magnitude increase in performance can be achieved by careful selection of the optimal algorithm and low-level optimization of the code. The current N-body solver uses a fast multipole method with an efficient strategy for finding the list of cell-cell interactions by a dual tree traversal. A task-based threading model is used to maximize thread-level parallelism and intra-node load-balancing. In order to extract the full potential of the SIMD units on the latest CPUs, the inner kernels are optimized using AVX instructions.en
dc.publisherSAGE Publicationsen
dc.relation.urlhttp://arxiv.org/abs/arXiv:1209.3516v3en
dc.titleAn FMM based on dual tree traversal for many-core architecturesen
dc.typeArticleen
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journalJournal of Algorithms & Computational Technologyen
dc.identifier.arxividarXiv:1209.3516en
kaust.authorYokota, Rioen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.