ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications

Handle URI:
http://hdl.handle.net/10754/554083
Title:
ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications
Authors:
Alzayer, Fatemah ( 0000-0003-4393-1457 )
Abstract:
We optimize parameters in OpenACC clauses for a stencil evaluation kernel executed on Graphical Processing Units (GPUs) using a variety of machine learning and optimization search algorithms, individually and in hybrid combinations, and compare execution time performance to the best possible obtained from brute force search. Several auto-tuning techniques – historic learning, random walk, simulated annealing, Nelder-Mead, and genetic algorithms – are evaluated over a large two-dimensional parameter space not satisfactorily addressed to date by OpenACC compilers, consisting of gang size and vector length. A hybrid of historic learning and Nelder-Mead delivers the best balance of high performance and low tuning effort. GPUs are employed over an increasing range of applications due to the performance available from their large number of cores, as well as their energy efficiency. However, writing code that takes advantage of their massive fine-grained parallelism requires deep knowledge of the hardware, and is generally a complex task involving program transformation and the selection of many parameters. To improve programmer productivity, the directive-based programming model OpenACC was announced as an industry standard in 2011. Various compilers have been developed to support this model, the most notable being those by Cray, CAPS, and PGI. While the architecture and number of cores have evolved rapidly, the compilers have failed to keep up at configuring the parallel program to run most e ciently on the hardware. Following successful approaches to obtain high performance in kernels for cache-based processors using auto-tuning, we approach this compiler-hardware gap in GPUs by employing auto-tuning for the key parameters “gang” and “vector” in OpenACC clauses. We demonstrate results for a stencil evaluation kernel typical of seismic imaging over a variety of realistically sized three-dimensional grid configurations, with different truncation error orders in the spatial dimensions. Apart from random walk and historic learning based on nearest neighbor in grid size, most of our heuristics, including the one that proves best, appear to be applied in this context for the first time. This work is a stepping-stone towards an OpenACC auto-tuning framework for more general high-performance numerical kernels optimized for GPU computations.
Advisors:
Keyes, David E. ( 0000-0002-4052-7224 )
Committee Member:
Zhang, Xiangliang ( 0000-0002-3574-5665 ) ; Hadwiger, Markus ( 0000-0003-1239-4871 ) ; Feki, Saber
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
17-May-2015
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKeyes, David E.en
dc.contributor.authorAlzayer, Fatemahen
dc.date.accessioned2015-05-18T14:16:21Zen
dc.date.available2015-05-18T14:16:21Zen
dc.date.issued2015-05-17en
dc.identifier.urihttp://hdl.handle.net/10754/554083en
dc.description.abstractWe optimize parameters in OpenACC clauses for a stencil evaluation kernel executed on Graphical Processing Units (GPUs) using a variety of machine learning and optimization search algorithms, individually and in hybrid combinations, and compare execution time performance to the best possible obtained from brute force search. Several auto-tuning techniques – historic learning, random walk, simulated annealing, Nelder-Mead, and genetic algorithms – are evaluated over a large two-dimensional parameter space not satisfactorily addressed to date by OpenACC compilers, consisting of gang size and vector length. A hybrid of historic learning and Nelder-Mead delivers the best balance of high performance and low tuning effort. GPUs are employed over an increasing range of applications due to the performance available from their large number of cores, as well as their energy efficiency. However, writing code that takes advantage of their massive fine-grained parallelism requires deep knowledge of the hardware, and is generally a complex task involving program transformation and the selection of many parameters. To improve programmer productivity, the directive-based programming model OpenACC was announced as an industry standard in 2011. Various compilers have been developed to support this model, the most notable being those by Cray, CAPS, and PGI. While the architecture and number of cores have evolved rapidly, the compilers have failed to keep up at configuring the parallel program to run most e ciently on the hardware. Following successful approaches to obtain high performance in kernels for cache-based processors using auto-tuning, we approach this compiler-hardware gap in GPUs by employing auto-tuning for the key parameters “gang” and “vector” in OpenACC clauses. We demonstrate results for a stencil evaluation kernel typical of seismic imaging over a variety of realistically sized three-dimensional grid configurations, with different truncation error orders in the spatial dimensions. Apart from random walk and historic learning based on nearest neighbor in grid size, most of our heuristics, including the one that proves best, appear to be applied in this context for the first time. This work is a stepping-stone towards an OpenACC auto-tuning framework for more general high-performance numerical kernels optimized for GPU computations.en
dc.language.isoenen
dc.subjectauto-tuningen
dc.subjectstencilen
dc.subjectsearch algorithmsen
dc.subjectopen ACCen
dc.subjectspeedupen
dc.titleACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applicationsen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberZhang, Xiangliangen
dc.contributor.committeememberHadwiger, Markusen
dc.contributor.committeememberFeki, Saberen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id128575en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.