Asynchronous Task-Based Parallelism in Seismic Imaging and Reservoir Modeling Simulations
AdvisorsKeyes, David E.
Permanent link to this recordhttp://hdl.handle.net/10754/656670
MetadataShow full item record
AbstractThe components of high-performance systems continue to become more complex on the road to exascale. This complexity is exposed at the level of: multi/many-core CPUs, accelerators (GPUs), interconnects (horizontal communication), and memory hierarchies (vertical communication). A crucial task is designing an algorithm and a programming model that scale to the same order of the HPC system size at multiple levels. This trend in HPC architecture more critically affects memory-intensive appli- cations than compute-bound applications. Accomplishing this task involves adopting less synchronous forms of the mathematical algorithm, reducing synchronization in the computational implementation, introducing more SIMT-style concurrency at the finest level of system hierarchy, and increasing arithmetic intensity as the bottleneck shifts from number of floating-point operations to number of memory accesses. This dissertation addresses these challenges in scientific simulation focusing in the dominant kernels of a memory-bound application: sparse solvers in implicit model- ing, and I/O in explicit reverse time migration in seismic imaging. We introduce asynchronous task-based parallelism into iterative algebraic preconditioners. We also introduce a task-based framework that hides the latency of I/O with computation. This dissertation targets two main applications in the oil and gas industry: reservoir simulation and seismic imaging simulation. It presents results on multi- and many- core systems and GPUs on four Top500 supercomputers: Summit, TSUBAME 3.0, Shaheen II, and Makman-2. We introduce an asynchronous implementation of four major memory-bound kernels: Algebraic multigrid (MPI+OmpSs), tridiagonal solve (MPI+OpenMP), Additive Schwarz Preconditioned Inexact Newton (MPI+MPI), and Reverse Time Migration (StarPU/StarPU+MPI and CUDA).