Maximizing I/O bandwidth for reverse time migration on heterogeneous large-scale systems
dc.contributor.author | Alturkestani, Tariq Lutfallah Mohammed | |
dc.contributor.author | Ltaief, Hatem | |
dc.contributor.author | Keyes, David E. | |
dc.date.accessioned | 2020-09-16T13:07:36Z | |
dc.date.available | 2020-09-16T13:07:36Z | |
dc.date.issued | 2020-08-18 | |
dc.identifier.citation | Alturkestani, T., Ltaief, H., & Keyes, D. (2020). Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems. Lecture Notes in Computer Science, 263–278. doi:10.1007/978-3-030-57675-2_17 | |
dc.identifier.isbn | 9783030576745 | |
dc.identifier.issn | 1611-3349 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.doi | 10.1007/978-3-030-57675-2_17 | |
dc.identifier.uri | http://hdl.handle.net/10754/665194 | |
dc.description.abstract | Reverse Time Migration (RTM) is an important scientific application for oil and gas exploration. The 3D RTM simulation generates terabytes of intermediate data that does not fit in main memory. In particular, RTM has two successive computational phases, i.e., the forward modeling and the backward propagation, that necessitate to write and then to read the state of the computed solution grid at specific time steps of the time integration. Advances in memory architecture have made it feasible and affordable to integrate hierarchical storage media on large-scale systems, starting from the traditional Parallel File Systems (PFS) to intermediate fast disk technologies (e.g., node-local and remote-shared Burst Buffer) and up to CPU main memory. To address the trend of heterogeneous HPC systems deployment, we introduce an extension to our Multilayer Buffer System (MLBS) framework to further maximize RTM I/O bandwidth in presence of GPU hardware accelerators. The main idea is to leverage the GPU’s High Bandwidth Memory (HBM) as an additional storage media layer. The objective of MLBS is ultimately to hide the application’s I/O overhead by enabling a buffering mechanism operating across all the hierarchical storage media layers. MLBS is therefore able to sustain the I/O bandwidth at each storage media layer. By asynchronously performing expensive I/O operations and creating opportunities for overlapping data motion with computations, MLBS may transform the original I/O bound behavior of the RTM application into a compute-bound regime. In fact, the prefetching strategy of MLBS allows the RTM application to believe that it has access to a larger memory capacity on the GPU, while transparently performing the necessary housekeeping across the storage layers. We demonstrate the effectiveness of MLBS on the Summit supercomputer using 2048 compute nodes equipped with a total of 12288 GPUs by achieving up to 1.4X performance speedup compared to the reference PFS-based RTM implementation for large 3D solution grid. | |
dc.description.sponsorship | For computer time, this research used the resources of the Supercomputing Laboratory at King Abdullah University of Science & Technology (KAUST) in Thuwal, Saudi Arabia and the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. We would like to thank Rached Abdelkhalak from NVIDIA for the insightful discussions and the anonymous reviewers for their constructive comments to improve this paper. This research was partially supported by Saudi Aramco through KAUST OSR contract #3226. | |
dc.publisher | Springer Nature | |
dc.relation.url | http://link.springer.com/10.1007/978-3-030-57675-2_17 | |
dc.rights | Archived with thanks to Springer International Publishing | |
dc.title | Maximizing I/O bandwidth for reverse time migration on heterogeneous large-scale systems | |
dc.type | Conference Paper | |
dc.contributor.department | Computer Science Program | |
dc.contributor.department | Computer Science | |
dc.contributor.department | Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division | |
dc.contributor.department | Extreme Computing Research Center | |
dc.contributor.department | Applied Mathematics and Computational Science Program | |
dc.contributor.department | Office of the President | |
dc.conference.date | 2020-08-24 to 2020-08-28 | |
dc.conference.name | 26th International European Conference on Parallel and Distributed Computing, Euro-Par 2020 | |
dc.conference.location | Warsaw, POL | |
dc.eprint.version | Post-print | |
dc.identifier.volume | 12247 LNCS | |
dc.identifier.pages | 263-278 | |
kaust.person | Alturkestani, Tariq Lutfallah Mohammed | |
kaust.person | Ltaief, Hatem | |
kaust.person | Keyes, David E. | |
dc.identifier.eid | 2-s2.0-85090098179 | |
refterms.dateFOA | 2020-09-17T05:29:50Z | |
kaust.acknowledged.supportUnit | OSR | |
kaust.acknowledged.supportUnit | Supercomputing Laboratory | |
dc.date.published-online | 2020-08-18 | |
dc.date.published-print | 2020 |
Files in this item
This item appears in the following Collection(s)
-
Conference Papers
-
Applied Mathematics and Computational Science Program
For more information visit: https://cemse.kaust.edu.sa/amcs -
Extreme Computing Research Center
-
Computer Science Program
For more information visit: https://cemse.kaust.edu.sa/cs -
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/