Show simple item record

dc.contributor.advisorKeyes, David E.
dc.contributor.authorAlturkestani, Tariq
dc.date.accessioned2020-10-01T10:30:08Z
dc.date.available2020-10-01T10:30:08Z
dc.date.issued2020-09-30
dc.identifier.citationAlturkestani, T. (2020). Maximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale Systems. KAUST Research Repository. https://doi.org/10.25781/KAUST-L1536
dc.identifier.doi10.25781/KAUST-L1536
dc.identifier.urihttp://hdl.handle.net/10754/665396
dc.description.abstractOut-of-Core simulation systems often produce a massive amount of data that cannot t on the aggregate fast memory of the compute nodes, and they also require to read back these data for computation. As a result, I/O data movement can be a bottleneck in large-scale simulations. Advances in memory architecture have made it feasible and a ordable to integrate hierarchical storage media on large-scale systems, starting from the traditional Parallel File Systems (PFSs) to intermediate fast disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst Bu ers) and up to CPU main memory and GPU High Bandwidth Memory (HBM). However, while adding additional and faster storage media increases I/O bandwidth, it pressures the CPU, as it becomes responsible for managing and moving data between these layers of storage. Simulation systems are thus vulnerable to being blocked by I/O operations. The Multilayer Bu er System (MLBS) proposed in this research demonstrates a general and versatile method for overlapping I/O with computation that helps to ameliorate the strain on the processors through asynchronous access. The main idea consists in decoupling I/O operations from computational phases using dedicated hardware resources to perform expensive context switches. MLBS monitors I/O tra c in each storage layer allowing fair utilization of shared resources. By continually prefetching up and down across all hardware layers of the memory and storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated applications and shifts it closer to a memory-bound or compute-bound regime. The evaluation on the Cray XC40 Shaheen-2 supercomputer for a representative I/Obound application, seismic inversion, shows that MLBS outperforms state-of-the-art PFSs, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively. On the IBM-built Summit supercomputer, using 2048 compute nodes equipped with a total of 12288 GPUs, MLBS achieves up to 1.4X performance speedup compared to the reference PFS-based implementation. MLBS is also demonstrated on applications from cosmology, combustion, and a classic out-of-core computational physics and linear algebra routines.
dc.language.isoen
dc.subjecthpc
dc.subjectdata
dc.subjectI/O
dc.subjectsupercomputer
dc.subjectBurst Buffer
dc.subjectHeterogeneous Computing
dc.titleMaximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale Systems
dc.typeDissertation
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.rights.embargodate2021-10-01
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberShihada, Basem
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberSun, Xian-He
thesis.degree.disciplineComputer Science
thesis.degree.nameDoctor of Philosophy
dc.rights.accessrightsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation will become available to the public after the expiration of the embargo on 2021-10-01.
kaust.request.doiyes


Files in this item

Thumbnail
Name:
Alturkestani_Tariq_Dissertation_Sept27.pdf
Size:
3.109Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record