An Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositories

Handle URI:
http://hdl.handle.net/10754/248731
Title:
An Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositories
Authors:
Ibrahim, Yasser E.
Abstract:
The tremendous increase in the semantic data is driving the demand for efficient query engines. RDF data being generated at an unprecedented rate introduces a storage, indexing, and querying challenge. Due to the size of the data and the federated nature of the semantic web, it is in many cases impractical to assume a central repository, and more attention is being given to distributed RDF stores. This work is motivated by two major drawbacks of current solutions: 1) pre-processing part is very expensive and takes prohibitively long time for large datasets, and 2) current distributed systems assume that a static partitioning of the data should perform well for all kinds of queries, and do not consider fluctuations in the queryload. In this paper we propose PHD-Store, an in-memory SPARQL engine for distributed RDF repositories. Our system does not assume any particular initial placement of the data and does not require pre-processing before running the first query. It analyzes incoming queries and adjusts data placement dynamically in such a way that communication among repositories is minimized for future queries. To achieve this flexibility, frequent query patterns are detected, and data are redistributed through a Propagating Hash Distribution (PHD) algorithm to ensure optimal placement for frequent query patterns. Our experiments with large RDF graphs verify that PHD-Store scales well and executes complex queries more efficiently than existing systems.
Advisors:
Kalnis, Panos ( 0000-0002-5060-1360 )
Committee Member:
Shihada, Basem ( 0000-0003-4434-4334 ) ; Skiadopoulos, Spiros
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jul-2012
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorKalnis, Panosen
dc.contributor.authorIbrahim, Yasser E.en
dc.date.accessioned2012-10-14T08:36:05Z-
dc.date.available2012-10-14T08:36:05Z-
dc.date.issued2012-07en
dc.identifier.urihttp://hdl.handle.net/10754/248731en
dc.description.abstractThe tremendous increase in the semantic data is driving the demand for efficient query engines. RDF data being generated at an unprecedented rate introduces a storage, indexing, and querying challenge. Due to the size of the data and the federated nature of the semantic web, it is in many cases impractical to assume a central repository, and more attention is being given to distributed RDF stores. This work is motivated by two major drawbacks of current solutions: 1) pre-processing part is very expensive and takes prohibitively long time for large datasets, and 2) current distributed systems assume that a static partitioning of the data should perform well for all kinds of queries, and do not consider fluctuations in the queryload. In this paper we propose PHD-Store, an in-memory SPARQL engine for distributed RDF repositories. Our system does not assume any particular initial placement of the data and does not require pre-processing before running the first query. It analyzes incoming queries and adjusts data placement dynamically in such a way that communication among repositories is minimized for future queries. To achieve this flexibility, frequent query patterns are detected, and data are redistributed through a Propagating Hash Distribution (PHD) algorithm to ensure optimal placement for frequent query patterns. Our experiments with large RDF graphs verify that PHD-Store scales well and executes complex queries more efficiently than existing systems.en
dc.language.isoenen
dc.titleAn Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositoriesen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberShihada, Basemen
dc.contributor.committeememberSkiadopoulos, Spirosen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
dc.person.id101734en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.