Evaluation of global synchronization for iterative algebra algorithms on many-core

Handle URI:
http://hdl.handle.net/10754/598248
Title:
Evaluation of global synchronization for iterative algebra algorithms on many-core
Authors:
ul Hasan Khan, Ayaz; Al-Mouhamed, Mayez; Firdaus, Lutfi A.
Abstract:
© 2015 IEEE. Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques: 1) Relaxed Synchronization, and 2) Block-Query Synchronization. These schemes are used in implementing numerical iterative solvers where computation/communication overlapping is one used optimization to enhance application performance. We have evaluated and analyzed the performance of the proposed synchronization techniques using Jacobi Iterative Solver in comparison to the state of the art inter-block lock-free synchronization techniques. We have achieved about 1-8% performance improvement in terms of execution time over lock-free synchronization depending on the problem size and the number of thread blocks. We have also evaluated the proposed algorithm on GPU and MIC architectures and obtained about 8-26% performance improvement over the barrier synchronization available in OpenMP programming environment depending on the problem size and number of cores used.
Citation:
Ul Hasan Khan A, Al-Mouhamed M, Firdaus LA (2015) Evaluation of global synchronization for iterative algebra algorithms on many-core. 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Available: http://dx.doi.org/10.1109/SNPD.2015.7176173.
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)
Issue Date:
Jun-2015
DOI:
10.1109/SNPD.2015.7176173
Type:
Conference Paper
Sponsors:
This project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH) – King Abdulaziz City for Science and Technology – through the Science and Technology Unit at King Fahd University of Petroleum & Minerals (KFUPM) – the Kingdom of Saudi Arabia, award number (12-INF3008-04). Thanks to King Abdullah University of Science and Technology (KAUST) for providing access to its K20X GPU cluster.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorul Hasan Khan, Ayazen
dc.contributor.authorAl-Mouhamed, Mayezen
dc.contributor.authorFirdaus, Lutfi A.en
dc.date.accessioned2016-02-25T13:17:21Zen
dc.date.available2016-02-25T13:17:21Zen
dc.date.issued2015-06en
dc.identifier.citationUl Hasan Khan A, Al-Mouhamed M, Firdaus LA (2015) Evaluation of global synchronization for iterative algebra algorithms on many-core. 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Available: http://dx.doi.org/10.1109/SNPD.2015.7176173.en
dc.identifier.doi10.1109/SNPD.2015.7176173en
dc.identifier.urihttp://hdl.handle.net/10754/598248en
dc.description.abstract© 2015 IEEE. Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques: 1) Relaxed Synchronization, and 2) Block-Query Synchronization. These schemes are used in implementing numerical iterative solvers where computation/communication overlapping is one used optimization to enhance application performance. We have evaluated and analyzed the performance of the proposed synchronization techniques using Jacobi Iterative Solver in comparison to the state of the art inter-block lock-free synchronization techniques. We have achieved about 1-8% performance improvement in terms of execution time over lock-free synchronization depending on the problem size and the number of thread blocks. We have also evaluated the proposed algorithm on GPU and MIC architectures and obtained about 8-26% performance improvement over the barrier synchronization available in OpenMP programming environment depending on the problem size and number of cores used.en
dc.description.sponsorshipThis project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH) – King Abdulaziz City for Science and Technology – through the Science and Technology Unit at King Fahd University of Petroleum & Minerals (KFUPM) – the Kingdom of Saudi Arabia, award number (12-INF3008-04). Thanks to King Abdullah University of Science and Technology (KAUST) for providing access to its K20X GPU cluster.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.subjectCUDAen
dc.subjectGPUen
dc.subjectInter-Block Synchronizationen
dc.subjectJacobi Iterative Method Graphics Processing Unit (GPU)en
dc.subjectOpenMPen
dc.subjectXeon Phien
dc.titleEvaluation of global synchronization for iterative algebra algorithms on many-coreen
dc.typeConference Paperen
dc.identifier.journal2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)en
dc.contributor.institutionKing Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabiaen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.