Show simple item record

dc.contributor.authorAl Farhan, Mohammed
dc.contributor.authorKeyes, David E.
dc.date.accessioned2018-04-30T06:58:23Z
dc.date.available2018-04-30T06:58:23Z
dc.date.issued2018-04-13
dc.identifier.citationAl Farhan MA, Keyes D (2018) Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures. IEEE Transactions on Parallel and Distributed Systems: 1–1. Available: http://dx.doi.org/10.1109/TPDS.2018.2826533.
dc.identifier.issn1045-9219
dc.identifier.doi10.1109/TPDS.2018.2826533
dc.identifier.urihttp://hdl.handle.net/10754/627692
dc.description.abstractWe investigate several state-of-the-practice shared-memory optimization techniques applied to key routines of an unstructured computational aerodynamics application with irregular memory accesses. We illustrate for the Intel KNL processor, as a representative of the processors in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating point numerics of the original code. We employ low and high-level architecture-specific code optimizations involving thread and data-level parallelism. Our approach is based upon a multi-level hierarchical distribution of work and data across both the threads and the SIMD units within every hardware core. On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline. These exhibit almost linear strong scalability up to 64 threads, and thereafter some improvement with hyperthreading. At substantially fewer Watts, we achieve up to 1.7x speedup relative to the performance of 72 threads of a 36-core Haswell CPU and roughly equivalent performance to 112 threads of a 56-core Skylake scalable processor. These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves.
dc.description.sponsorshipSupport in the form of computing resources was provided by KAUST Extreme Computing Research Center, KAUST Supercomputing Laboratory, KAUST Information Technology Research Division, and Intel Parallel Computing Centers.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.urlhttps://ieeexplore.ieee.org/document/8337750/
dc.rights(c) 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
dc.subjectAerodynamics
dc.subjectAVX-512
dc.subjectComputational aerodynamics
dc.subjectComputational modeling
dc.subjectComputer architecture
dc.subjectData-level parallelism
dc.subjectHardware
dc.subjectIntel Xeon Phi
dc.subjectKernel
dc.subjectKnights Landing
dc.subjectOptimization
dc.subjectParallel processing
dc.subjectPerformance optimization
dc.subjectSIMD
dc.subjectThread-level parallelism
dc.subjectUnstructured meshes
dc.titleOptimizations of Unstructured Aerodynamics Computations for Many-core Architectures
dc.typeArticle
dc.contributor.departmentApplied Mathematics and Computational Science Program
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentECRC, KAUST, Jeddah, Jeddah Saudi Arabia
dc.contributor.departmentExtreme Computing Research Center
dc.identifier.journalIEEE Transactions on Parallel and Distributed Systems
dc.eprint.versionPost-print
kaust.personAl Farhan, Mohammed
kaust.personKeyes, David E.
refterms.dateFOA2018-06-14T07:29:43Z
dc.date.published-online2018-04-13
dc.date.published-print2018


Files in this item

Thumbnail
Name:
08337750.pdf
Size:
602.0Kb
Format:
PDF
Description:
Accepted Manuscript

This item appears in the following Collection(s)

Show simple item record