Querying and Mining Strings Made Easy

Handle URI:
http://hdl.handle.net/10754/626127
Title:
Querying and Mining Strings Made Easy
Authors:
Sahli, Majed; Mansour, Essam; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
With the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.
KAUST Department:
KAUST, Thuwal, Saudi Arabia
Citation:
Sahli M, Mansour E, Kalnis P (2017) Querying and Mining Strings Made Easy. Lecture Notes in Computer Science: 3–17. Available: http://dx.doi.org/10.1007/978-3-319-69179-4_1.
Publisher:
Springer International Publishing
Journal:
Advanced Data Mining and Applications
Issue Date:
13-Oct-2017
DOI:
10.1007/978-3-319-69179-4_1
Type:
Book Chapter
ISSN:
0302-9743; 1611-3349
Additional Links:
https://link.springer.com/chapter/10.1007%2F978-3-319-69179-4_1
Appears in Collections:
Book Chapters

Full metadata record

DC FieldValue Language
dc.contributor.authorSahli, Majeden
dc.contributor.authorMansour, Essamen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2017-11-06T10:47:45Z-
dc.date.available2017-11-06T10:47:45Z-
dc.date.issued2017-10-13en
dc.identifier.citationSahli M, Mansour E, Kalnis P (2017) Querying and Mining Strings Made Easy. Lecture Notes in Computer Science: 3–17. Available: http://dx.doi.org/10.1007/978-3-319-69179-4_1.en
dc.identifier.issn0302-9743en
dc.identifier.issn1611-3349en
dc.identifier.doi10.1007/978-3-319-69179-4_1en
dc.identifier.urihttp://hdl.handle.net/10754/626127-
dc.description.abstractWith the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.en
dc.publisherSpringer International Publishingen
dc.relation.urlhttps://link.springer.com/chapter/10.1007%2F978-3-319-69179-4_1en
dc.rightsThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-69179-4_1en
dc.titleQuerying and Mining Strings Made Easyen
dc.typeBook Chapteren
dc.contributor.departmentKAUST, Thuwal, Saudi Arabiaen
dc.identifier.journalAdvanced Data Mining and Applicationsen
dc.eprint.versionPost-printen
dc.contributor.institutionSaudi Aramco, Dhahran, Saudi Arabiaen
dc.contributor.institutionQatar Computing Research Institute, HBKU, Doha, Qataren
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.