Show simple item record

dc.contributor.authorMudigere, Dheevatsa
dc.contributor.authorSridharan, Srinivas
dc.contributor.authorDeshpande, Anand
dc.contributor.authorPark, Jongsoo
dc.contributor.authorHeinecke, Alexander
dc.contributor.authorSmelyanskiy, Mikhail
dc.contributor.authorKaul, Bharat
dc.contributor.authorDubey, Pradeep
dc.contributor.authorKaushik, Dinesh
dc.contributor.authorKeyes, David E.
dc.date.accessioned2015-09-10T14:18:37Z
dc.date.available2015-09-10T14:18:37Z
dc.date.issued2015-05
dc.identifier.doi10.1109/IPDPS.2015.114
dc.identifier.urihttp://hdl.handle.net/10754/577110
dc.description.abstractIn this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.titleExploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems
dc.typeConference Paper
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentExtreme Computing Research Center
dc.identifier.journal2015 IEEE International Parallel and Distributed Processing Symposium
dc.contributor.institutionParallel Computing Lab, Intel Corporation, Bangalore, India
dc.contributor.institutionParallel Computing Lab, Intel Corporation, Santa Clara, CA
dc.contributor.institutionQatar Foundation, Doha, Qatar
kaust.personKeyes, David E.


This item appears in the following Collection(s)

Show simple item record