High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers
dc.contributor.author | Kamalakkannan, Kamalavasan | |
dc.contributor.author | Mudalige, Gihan R. | |
dc.contributor.author | Reguly, Istvan Z. | |
dc.contributor.author | Fahmy, Suhaib A. | |
dc.date.accessioned | 2021-01-13T05:57:07Z | |
dc.date.available | 2021-01-13T05:57:07Z | |
dc.date.issued | 2021-05-17 | |
dc.identifier.uri | http://hdl.handle.net/10754/666880 | |
dc.description.abstract | This paper presents a workflow for synthesizing near-optimal FPGA implementations of structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class and its computation-communication pattern and the architectural capabilities of the FPGA to accelerate solvers for high-performance computing applications. Key new features of the workflow are (1) the unification of standard state-of-the-art techniques with a number of highgain optimizations such as batching and spatial blocking/tiling, motivated by increasing throughput for real-world workloads and (2) the development and use of a predictive analytical model to explore the design space, and obtain resource and performance estimates. Three representative applications are implemented using the design workflow on a Xilinx Alveo U280 FPGA, demonstrating near-optimal performance and over 85% predictive model accuracy. These are compared with equivalent highly-optimized implementations of the same applications on modern HPC-grade GPUs (Nvidia V100), analyzing time to solution, bandwidth, and energy consumption. Performance results indicate comparable runtimes with the V100 GPU, with over 2× energy savings for the largest non-trivial application on the FPGA. Our investigation shows the challenges of achieving high performance on current generation FPGAs compared to traditional architectures. We discuss determinants for a given stencil code to be amenable to FPGA implementation, providing insights into the feasibility and profitability of a design and its resulting performance. | |
dc.publisher | IEEE | |
dc.rights | Archived with thanks to IEEE | |
dc.subject | FPGAs | |
dc.subject | Stencil Applications | |
dc.subject | Explicit solvers | |
dc.title | High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers | |
dc.type | Conference Paper | |
dc.contributor.department | King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia | |
dc.conference.date | 17–21 May 2021 | |
dc.conference.name | IEEE International Parallel and Distributed Processing Symposium | |
dc.conference.location | Portland, OR | |
dc.eprint.version | Post-print | |
dc.contributor.institution | Dept. of Computer Science University of Warwick, UK | |
dc.contributor.institution | Faculty of Information Technology & Bionics Pazmany Peter Catholic University, Hungary | |
pubs.publication-status | Accepted | |
dc.identifier.arxivid | arxiv.org/pdf/2101.01177 | |
kaust.person | Fahmy, Suhaib A. | |
refterms.dateFOA | 2021-01-13T05:57:08Z |