Show simple item record

dc.contributor.authorHadri, Bilel
dc.contributor.authorParsani, Matteo
dc.contributor.authorHutchinson, Maxwell
dc.contributor.authorHeinecke, Alexander
dc.contributor.authorDalcin, Lisandro
dc.contributor.authorKeyes, David E.
dc.date.accessioned2019-09-18T14:16:58Z
dc.date.available2019-09-18T14:16:58Z
dc.date.issued2019
dc.identifier.urihttp://hdl.handle.net/10754/656780
dc.description.abstractWe present in this paper a comprehensive performance study of highly efficient extreme scale direct numerical simulations of secondary flows, using an optimized version of Nek5000. Our investigations are conducted on various Cray XC40 systems, using a very high-order spectral element method. Single-node efficiency is achieved by auto-generated assembly implementations of small matrix multiplies and key vector-vector operations, streaming lossless I/O compression, aggressive loop merging and selective single precision evaluations. Comparative studies across different Cray XC40 systems at scale, Trinity (LANL), Cori(NERSC) and ShaheenII(KAUST), show that a Cray programming environment, network configuration, parallel file system and burst buffer all have a major impact on the performance. All three systems possess a similar hardware with similar CPU nodes and parallel file system, but they have a different network theoretical bandwidth, a different OS and different versions of the programming environment. Our study reveals how these slight configuration differences can be critical in terms of performance of the application. We also find that using 294,912 cores (9216 nodes) on Trinity XC40 sustains the petascale performance, and as well 50% of peak memory bandwidth over the entire solver (500 TB/s in aggregate). On 3072 KNL nodes of Cori, we reach 378 TFLOP/s with an aggregated bandwidth of 310 TB/s, corresponding to time-to-solution 2.11× faster than obtained with the same number of Haswell nodes.
dc.language.isoen
dc.publisherCray User Group
dc.relation.urlhttps://cug.org/proceedings/cug2019_proceedings/includes/files/pap130s2-file1.pdf
dc.relation.urlhttps://cug.org/proceedings/cug2019_proceedings/includes/files/pap130s2-file2.pdf
dc.subjectCray XC40
dc.subjectHaswell
dc.subjectKNL
dc.subjectNek5000
dc.subjectPerformance Analysis
dc.subjectRegression
dc.subjectEnergy Efficiency
dc.titlePerformance Study of Sustained Petascale Direct Numerical Simulation on Cray XC40 Systems (Trinity, Shaheen2 and Cori)
dc.typeConference Paper
dc.typePresentation
dc.contributor.departmentKAUST Supercomputing Lab
dc.contributor.departmentExtreme Computing Research Center
dc.description.fundingThe research reported in this paper was funded by King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia. We are thankful for the computing resources of the Supercomputing Laboratory and the Extreme Computing Research Center at KAUST; the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; and the Trinity project managed and operated by Los Alamos National Laboratory and Sandia National Laboratories.
dc.conference.dateMay 5-9, 2019
dc.conference.nameCray User Group 2019
dc.conference.locationMontreal, Canada
dc.eprint.versionPost-print
dc.contributor.institutionCitrine Informatics, Redwood City, California, USA
dc.contributor.institutionIntel Corporation, Santa Clara, California, USA
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)
pubs.publication-statusPublished
kaust.personHadri, Bilel
kaust.personParsani, Matteo
kaust.personDalcin, Lisandro
kaust.personKeyes, David E.
refterms.dateFOA2020-12-10T05:33:06Z


Files in this item

Thumbnail
Name:
pap130s2-file1.pdf
Size:
597.4Kb
Format:
PDF
Description:
Conference Paper
Thumbnail
Name:
pap130s2-file2.pdf
Size:
26.89Mb
Format:
PDF
Description:
Presentation Slides

This item appears in the following Collection(s)

Show simple item record