Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

Handle URI:
http://hdl.handle.net/10754/577110
Title:
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems
Authors:
Mudigere, Dheevatsa; Sridharan, Srinivas; Deshpande, Anand; Park, Jongsoo; Heinecke, Alexander; Smelyanskiy, Mikhail; Kaul, Bharat; Dubey, Pradeep; Kaushik, Dinesh; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Applied Mathematics and Computational Science Program; Extreme Computing Research Center
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
2015 IEEE International Parallel and Distributed Processing Symposium
Issue Date:
May-2015
DOI:
10.1109/IPDPS.2015.114
Type:
Conference Paper
Appears in Collections:
Conference Papers; Applied Mathematics and Computational Science Program; Extreme Computing Research Center; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorMudigere, Dheevatsaen
dc.contributor.authorSridharan, Srinivasen
dc.contributor.authorDeshpande, Ananden
dc.contributor.authorPark, Jongsooen
dc.contributor.authorHeinecke, Alexanderen
dc.contributor.authorSmelyanskiy, Mikhailen
dc.contributor.authorKaul, Bharaten
dc.contributor.authorDubey, Pradeepen
dc.contributor.authorKaushik, Dineshen
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2015-09-10T14:18:37Zen
dc.date.available2015-09-10T14:18:37Zen
dc.date.issued2015-05en
dc.identifier.doi10.1109/IPDPS.2015.114en
dc.identifier.urihttp://hdl.handle.net/10754/577110en
dc.description.abstractIn this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.titleExploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systemsen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.contributor.departmentExtreme Computing Research Centeren
dc.identifier.journal2015 IEEE International Parallel and Distributed Processing Symposiumen
dc.contributor.institutionParallel Computing Lab, Intel Corporation, Bangalore, Indiaen
dc.contributor.institutionParallel Computing Lab, Intel Corporation, Santa Clara, CAen
dc.contributor.institutionQatar Foundation, Doha, Qataren
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.