Show simple item record

dc.contributor.authorAbdelaziz, Ibrahim
dc.contributor.authorAl-Harbi, Razen
dc.contributor.authorKhayyat, Zuhair
dc.contributor.authorKalnis, Panos
dc.date.accessioned2017-11-29T11:13:54Z
dc.date.available2017-11-29T11:13:54Z
dc.date.issued2017-10-19
dc.identifier.citationAbdelaziz I, Harbi R, Khayyat Z, Kalnis P (2017) A survey and experimental comparison of distributed SPARQL engines for very large RDF data. Proceedings of the VLDB Endowment 10: 2049–2060. Available: http://dx.doi.org/10.14778/3151106.3151109.
dc.identifier.issn2150-8097
dc.identifier.doi10.14778/3151106.3151109
dc.identifier.urihttp://hdl.handle.net/10754/626221
dc.description.abstractDistributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 22 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing and categorize them by several characteristics. Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost, query performance, scalability and workload adaptability, using a variety of synthetic and real large datasets with up to 4.3 billion triples. Our results provide valuable insights for practitioners to understand the trade-offs for their usage scenarios. Finally, we publish online our evaluation framework, including all datasets and workloads, for researchers to compare their novel systems against the existing ones.
dc.publisherVLDB Endowment
dc.relation.urlhttps://dl.acm.org/citation.cfm?doid=3151106.3151109
dc.rightsThis work is licensed under the Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org.
dc.rights.urihttp://creativecommons.org/licenses/byncnd/4.0/
dc.titleA survey and experimental comparison of distributed SPARQL engines for very large RDF data
dc.typeArticle
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.identifier.journalProceedings of the VLDB Endowment
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionSaudi Aramco
kaust.personAbdelaziz, Ibrahim
kaust.personKhayyat, Zuhair
kaust.personKalnis, Panos
refterms.dateFOA2018-06-14T02:21:43Z
dc.date.published-online2017-10-19
dc.date.published-print2017-09-01


Files in this item

Thumbnail
Name:
p2049-abdelaziz.pdf
Size:
988.0Kb
Format:
PDF
Description:
Published version

This item appears in the following Collection(s)

Show simple item record

This work is licensed under the Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org.
Except where otherwise noted, this item's license is described as This work is licensed under the Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/4.0/. For any use beyond those covered by this license, obtain permission by emailing info@vldb.org.