StarDB: a large-scale DBMS for strings

Sahli, Majed; Mansour, Essam; Kalnis, Panos

StarDB: a large-scale DBMS for strings

Files

p1844-sahli.pdf (390.72 KB)

Type

Conference Paper

Authors

Sahli, Majed

Mansour, Essam
Kalnis, Panos

KAUST Department

Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Date

2015-08-01

Abstract

Strings and applications using them are proliferating in science and business. Currently, strings are stored in file systems and processed using ad-hoc procedural code. Existing techniques are not flexible and cannot efficiently handle complex queries or large datasets. In this paper, we demonstrate StarDB, a distributed database system for analytics on strings. StarDB hides data and system complexities and allows users to focus on analytics. It uses a comprehensive set of parallel string operations and provides a declarative query language to solve complex queries. StarDB automatically tunes itself and runs with over 90% efficiency on supercomputers, public clouds, clusters, and workstations. We test StarDB using real datasets that are 2 orders of magnitude larger than the datasets reported by previous works.

Citation

Sahli, M., Mansour, E., & Kalnis, P. (2015). StarDB. Proceedings of the VLDB Endowment, 8(12), 1844–1847. doi:10.14778/2824032.2824082