An Adaptive SPARQL Engine with Dynamic Partitioning for Distributed RDF Repositories
AuthorsIbrahim, Yasser E.
Embargo End Date2015-01-01
Permanent link to this recordhttp://hdl.handle.net/10754/248731
MetadataShow full item record
Access RestrictionsAt the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2015-01-01.
AbstractThe tremendous increase in the semantic data is driving the demand for efficient query engines. RDF data being generated at an unprecedented rate introduces a storage, indexing, and querying challenge. Due to the size of the data and the federated nature of the semantic web, it is in many cases impractical to assume a central repository, and more attention is being given to distributed RDF stores. This work is motivated by two major drawbacks of current solutions: 1) pre-processing part is very expensive and takes prohibitively long time for large datasets, and 2) current distributed systems assume that a static partitioning of the data should perform well for all kinds of queries, and do not consider fluctuations in the queryload. In this paper we propose PHD-Store, an in-memory SPARQL engine for distributed RDF repositories. Our system does not assume any particular initial placement of the data and does not require pre-processing before running the first query. It analyzes incoming queries and adjusts data placement dynamically in such a way that communication among repositories is minimized for future queries. To achieve this flexibility, frequent query patterns are detected, and data are redistributed through a Propagating Hash Distribution (PHD) algorithm to ensure optimal placement for frequent query patterns. Our experiments with large RDF graphs verify that PHD-Store scales well and executes complex queries more efficiently than existing systems.