Show simple item record

dc.contributor.advisorKeyes, David E.
dc.contributor.authorAlzayer, Fatemah
dc.date.accessioned2015-05-18T14:16:21Z
dc.date.available2015-05-18T14:16:21Z
dc.date.issued2015-05-17
dc.identifier.doi10.25781/KAUST-K7EKW
dc.identifier.urihttp://hdl.handle.net/10754/554083
dc.description.abstractWe optimize parameters in OpenACC clauses for a stencil evaluation kernel executed on Graphical Processing Units (GPUs) using a variety of machine learning and optimization search algorithms, individually and in hybrid combinations, and compare execution time performance to the best possible obtained from brute force search. Several auto-tuning techniques – historic learning, random walk, simulated annealing, Nelder-Mead, and genetic algorithms – are evaluated over a large two-dimensional parameter space not satisfactorily addressed to date by OpenACC compilers, consisting of gang size and vector length. A hybrid of historic learning and Nelder-Mead delivers the best balance of high performance and low tuning effort. GPUs are employed over an increasing range of applications due to the performance available from their large number of cores, as well as their energy efficiency. However, writing code that takes advantage of their massive fine-grained parallelism requires deep knowledge of the hardware, and is generally a complex task involving program transformation and the selection of many parameters. To improve programmer productivity, the directive-based programming model OpenACC was announced as an industry standard in 2011. Various compilers have been developed to support this model, the most notable being those by Cray, CAPS, and PGI. While the architecture and number of cores have evolved rapidly, the compilers have failed to keep up at configuring the parallel program to run most e ciently on the hardware. Following successful approaches to obtain high performance in kernels for cache-based processors using auto-tuning, we approach this compiler-hardware gap in GPUs by employing auto-tuning for the key parameters “gang” and “vector” in OpenACC clauses. We demonstrate results for a stencil evaluation kernel typical of seismic imaging over a variety of realistically sized three-dimensional grid configurations, with different truncation error orders in the spatial dimensions. Apart from random walk and historic learning based on nearest neighbor in grid size, most of our heuristics, including the one that proves best, appear to be applied in this context for the first time. This work is a stepping-stone towards an OpenACC auto-tuning framework for more general high-performance numerical kernels optimized for GPU computations.
dc.language.isoen
dc.subjectauto-tuning
dc.subjectstencil
dc.subjectsearch algorithms
dc.subjectopen ACC
dc.subjectspeedup
dc.titleACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications
dc.typeThesis
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberZhang, Xiangliang
dc.contributor.committeememberHadwiger, Markus
dc.contributor.committeememberFeki, Saber
thesis.degree.disciplineComputer Science
thesis.degree.nameMaster of Science
refterms.dateFOA2018-06-13T11:19:15Z


Files in this item

Thumbnail
Name:
Thesis.pdf
Size:
2.990Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record