Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

Handle URI:
http://hdl.handle.net/10754/562189
Title:
Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor
Authors:
Malas, Tareq Majed Yasin ( 0000-0002-4506-9365 ) ; Ahmadia, Aron; Brown, Jed; Gunnels, John A.; Keyes, David E. ( 0000-0002-4052-7224 )
Abstract:
Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM® Blue Gene®/P supercomputer's PowerPC® 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7× speedup over the best previously published results. © The Author(s) 2012.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Applied Mathematics and Computational Science Program; Extreme Computing Research Center; Core Labs; Computer Science Program
Publisher:
SAGE Publications
Journal:
International Journal of High Performance Computing Applications
Issue Date:
21-May-2012
DOI:
10.1177/1094342012444795
ARXIV:
arXiv:1201.3496
Type:
Article
ISSN:
10943420
Additional Links:
http://arxiv.org/abs/arXiv:1201.3496v1
Appears in Collections:
Articles; Applied Mathematics and Computational Science Program; Extreme Computing Research Center; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorMalas, Tareq Majed Yasinen
dc.contributor.authorAhmadia, Aronen
dc.contributor.authorBrown, Jeden
dc.contributor.authorGunnels, John A.en
dc.contributor.authorKeyes, David E.en
dc.date.accessioned2015-08-03T09:46:51Zen
dc.date.available2015-08-03T09:46:51Zen
dc.date.issued2012-05-21en
dc.identifier.issn10943420en
dc.identifier.doi10.1177/1094342012444795en
dc.identifier.urihttp://hdl.handle.net/10754/562189en
dc.description.abstractSeveral emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM® Blue Gene®/P supercomputer's PowerPC® 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU's instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7× speedup over the best previously published results. © The Author(s) 2012.en
dc.publisherSAGE Publicationsen
dc.relation.urlhttp://arxiv.org/abs/arXiv:1201.3496v1en
dc.subjectBlue Gene/Pen
dc.subjectcode generationen
dc.subjecthigh-performance computingen
dc.subjectperformance optimizationen
dc.subjectSIMDen
dc.titleOptimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processoren
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.contributor.departmentExtreme Computing Research Centeren
dc.contributor.departmentCore Labsen
dc.contributor.departmentComputer Science Programen
dc.identifier.journalInternational Journal of High Performance Computing Applicationsen
dc.contributor.institutionArgonne National Laboratory, Argonne, IL, United Statesen
dc.contributor.institutionIBM T.J. Watson Research Center, Yorktown Heights, NY, United Statesen
dc.identifier.arxividarXiv:1201.3496en
kaust.authorMalas, Tareq Majed Yasinen
kaust.authorAhmadia, Aronen
kaust.authorKeyes, David E.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.