Query Optimizations over Decentralized RDF Graphs

Handle URI:
http://hdl.handle.net/10754/625591
Title:
Query Optimizations over Decentralized RDF Graphs
Authors:
Abdelaziz, Ibrahim ( 0000-0003-1449-5115 ) ; Mansour, Essam; Ouzzani, Mourad; Aboulnaga, Ashraf; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.
KAUST Department:
King Abdullah University of Science & Technology
Citation:
Abdelaziz I, Mansour E, Ouzzani M, Aboulnaga A, Kalnis P (2017) Query Optimizations over Decentralized RDF Graphs. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). Available: http://dx.doi.org/10.1109/ICDE.2017.59.
Publisher:
IEEE
Journal:
2017 IEEE 33rd International Conference on Data Engineering (ICDE)
Conference/Event name:
33rd IEEE International Conference on Data Engineering, ICDE 2017
Issue Date:
18-May-2017
DOI:
10.1109/ICDE.2017.59
Type:
Conference Paper
Additional Links:
http://ieeexplore.ieee.org/document/7929955/
Appears in Collections:
Conference Papers

Full metadata record

DC FieldValue Language
dc.contributor.authorAbdelaziz, Ibrahimen
dc.contributor.authorMansour, Essamen
dc.contributor.authorOuzzani, Mouraden
dc.contributor.authorAboulnaga, Ashrafen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2017-10-03T12:49:27Z-
dc.date.available2017-10-03T12:49:27Z-
dc.date.issued2017-05-18en
dc.identifier.citationAbdelaziz I, Mansour E, Ouzzani M, Aboulnaga A, Kalnis P (2017) Query Optimizations over Decentralized RDF Graphs. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). Available: http://dx.doi.org/10.1109/ICDE.2017.59.en
dc.identifier.doi10.1109/ICDE.2017.59en
dc.identifier.urihttp://hdl.handle.net/10754/625591-
dc.description.abstractApplications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.en
dc.publisherIEEEen
dc.relation.urlhttp://ieeexplore.ieee.org/document/7929955/en
dc.titleQuery Optimizations over Decentralized RDF Graphsen
dc.typeConference Paperen
dc.contributor.departmentKing Abdullah University of Science & Technologyen
dc.identifier.journal2017 IEEE 33rd International Conference on Data Engineering (ICDE)en
dc.conference.date2017-04-19 to 2017-04-22en
dc.conference.name33rd IEEE International Conference on Data Engineering, ICDE 2017en
dc.conference.locationSan Diego, CA, USAen
dc.contributor.institutionQatar Computing Research Institute, HBKUen
kaust.authorAbdelaziz, Ibrahimen
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.