Fast and scalable inequality joins

Handle URI:
http://hdl.handle.net/10754/622197
Title:
Fast and scalable inequality joins
Authors:
Khayyat, Zuhair ( 0000-0003-3650-6997 ) ; Lucia, William; Singh, Meghna; Ouzzani, Mourad; Papotti, Paolo; Quiané-Ruiz, Jorge Arnulfo; Tang, Nan; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as (Formula presented.)-tree, (Formula presented.)-tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef. By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster. © 2016 Springer-Verlag Berlin Heidelberg
KAUST Department:
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Citation:
Khayyat Z, Lucia W, Singh M, Ouzzani M, Papotti P, et al. (2016) Fast and scalable inequality joins. The VLDB Journal. Available: http://dx.doi.org/10.1007/s00778-016-0441-6.
Publisher:
Springer Nature
Journal:
The VLDB Journal
Issue Date:
7-Sep-2016
DOI:
10.1007/s00778-016-0441-6
Type:
Article
ISSN:
1066-8888; 0949-877X
Sponsors:
Portions of the research in this paper used the MDC Database made available by Idiap Research Institute, Switzerland and owned by Nokia.
Additional Links:
http://link.springer.com/article/10.1007%2Fs00778-016-0441-6
Appears in Collections:
Articles

Full metadata record

DC FieldValue Language
dc.contributor.authorKhayyat, Zuhairen
dc.contributor.authorLucia, Williamen
dc.contributor.authorSingh, Meghnaen
dc.contributor.authorOuzzani, Mouraden
dc.contributor.authorPapotti, Paoloen
dc.contributor.authorQuiané-Ruiz, Jorge Arnulfoen
dc.contributor.authorTang, Nanen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2017-01-02T08:42:37Z-
dc.date.available2017-01-02T08:42:37Z-
dc.date.issued2016-09-07en
dc.identifier.citationKhayyat Z, Lucia W, Singh M, Ouzzani M, Papotti P, et al. (2016) Fast and scalable inequality joins. The VLDB Journal. Available: http://dx.doi.org/10.1007/s00778-016-0441-6.en
dc.identifier.issn1066-8888en
dc.identifier.issn0949-877Xen
dc.identifier.doi10.1007/s00778-016-0441-6en
dc.identifier.urihttp://hdl.handle.net/10754/622197-
dc.description.abstractInequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as (Formula presented.)-tree, (Formula presented.)-tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef. By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster. © 2016 Springer-Verlag Berlin Heidelbergen
dc.description.sponsorshipPortions of the research in this paper used the MDC Database made available by Idiap Research Institute, Switzerland and owned by Nokia.en
dc.publisherSpringer Natureen
dc.relation.urlhttp://link.springer.com/article/10.1007%2Fs00778-016-0441-6en
dc.subjectIncrementalen
dc.subjectInequality joinen
dc.subjectPostgreSQLen
dc.subjectSelectivity estimationen
dc.subjectSpark SQLen
dc.titleFast and scalable inequality joinsen
dc.typeArticleen
dc.contributor.departmentKing Abdullah University of Science and Technology, Thuwal, Saudi Arabiaen
dc.identifier.journalThe VLDB Journalen
dc.contributor.institutionQatar Computing Research Institute, Doha, Qataren
dc.contributor.institutionArizona State University, Tempe, AZ, United Statesen
kaust.authorKhayyat, Zuhairen
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.