Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly optimized fork-join based implementation of the FMM and extend it to a data-driven implementation using a distributed task scheduling approach. This study exposes some limitations of the conventional fork-join implementation in terms of synchronization overheads. We find that these are not negligible and their elimination by the data-driven method, with a careful data locality strategy, was beneficial. Experimental evaluation of both methods on state-of-the-art multi-socket multi-core architectures showed up to 22% speed-ups of the data-driven approach compared to the original method. We demonstrate that a data-driven execution of FMM not only improves performance by avoiding global synchronization overheads but also reduces the memory-bandwidth pressure caused by memory-intensive computations. © 2013 Springer-Verlag.

Amer, A., Maruyama, N., Pericàs, M., Taura, K., Yokota, R., & Matsuoka, S. (2013). Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM. Supercomputing, 255–266. doi:10.1007/978-3-642-38750-0_19

Springer Berlin Heidelberg

Lecture Notes in Computer Science

Conference/Event Name
28th International Supercomputing Conference on Supercomputing, ISC 2013


Additional Links

Permanent link to this record