Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization

Handle URI:
http://hdl.handle.net/10754/599007
Title:
Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization
Authors:
Belli, Roberto; Hoefler, Torsten
Abstract:
Remote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.
Citation:
Belli R, Hoefler T (2015) Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. 2015 IEEE International Parallel and Distributed Processing Symposium. Available: http://dx.doi.org/10.1109/ipdps.2015.30.
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
2015 IEEE International Parallel and Distributed Processing Symposium
Issue Date:
May-2015
DOI:
10.1109/ipdps.2015.30
Type:
Conference Paper
Sponsors:
We thank Hatem Ltaief (Kaust) for providing theCholesky example. We thank the GASPI team for inspir-ing discussions about RMA interfaces and Christian Sim-mendinger for numerous clarifications about the GASPIspecification. We thank James Dinan (Intel), Jeff Ham-mond (Intel), Kathy Yelick (LBNL), Edgar Solomonik, TimoSchneider, and Salvatore Di Girolamo for helpful discus-sions, Larry Kaplan (Cray) for help with uGNI, and theSwiss National Supercomputing Centre (CSCS) for accessto Piz Daint.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorBelli, Robertoen
dc.contributor.authorHoefler, Torstenen
dc.date.accessioned2016-02-25T13:51:04Zen
dc.date.available2016-02-25T13:51:04Zen
dc.date.issued2015-05en
dc.identifier.citationBelli R, Hoefler T (2015) Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization. 2015 IEEE International Parallel and Distributed Processing Symposium. Available: http://dx.doi.org/10.1109/ipdps.2015.30.en
dc.identifier.doi10.1109/ipdps.2015.30en
dc.identifier.urihttp://hdl.handle.net/10754/599007en
dc.description.abstractRemote Memory Access (RMA) programming enables direct access to low-level hardware features to achieve high performance for distributed-memory programs. However, the design of RMA programming schemes focuses on the memory access and less on the synchronization. For example, in contemporary RMA programming systems, the widely used producer-consumer pattern can only be implemented inefficiently, incurring in an overhead of an additional round-trip message. We propose Notified Access, a scheme where the target process of an access can receive a completion notification. This scheme enables direct and efficient synchronization with a minimum number of messages. We implement our scheme in an open source MPI-3 RMA library and demonstrate lower overheads (two cache misses) than other point-to-point synchronization mechanisms for each notification. We also evaluate our implementation on three real-world benchmarks, a stencil computation, a tree computation, and a Colicky factorization implemented with tasks. Our scheme always performs better than traditional message passing and other existing RMA synchronization schemes, providing up to 50% speedup on small messages. Our analysis shows that Notified Access is a valuable primitive for any RMA system. Furthermore, we provide guidance for the design of low-level network interfaces to support Notified Access efficiently.en
dc.description.sponsorshipWe thank Hatem Ltaief (Kaust) for providing theCholesky example. We thank the GASPI team for inspir-ing discussions about RMA interfaces and Christian Sim-mendinger for numerous clarifications about the GASPIspecification. We thank James Dinan (Intel), Jeff Ham-mond (Intel), Kathy Yelick (LBNL), Edgar Solomonik, TimoSchneider, and Salvatore Di Girolamo for helpful discus-sions, Larry Kaplan (Cray) for help with uGNI, and theSwiss National Supercomputing Centre (CSCS) for accessto Piz Daint.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.titleNotified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronizationen
dc.typeConference Paperen
dc.identifier.journal2015 IEEE International Parallel and Distributed Processing Symposiumen
dc.contributor.institutionDept. of Computer Science, ETH Zurichen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.