Lightning fast and space efficient inequality joins

Handle URI:
http://hdl.handle.net/10754/593180
Title:
Lightning fast and space efficient inequality joins
Authors:
Khayyat, Zuhair ( 0000-0003-3650-6997 ) ; Lucia, William; Singh, Meghna; Ouzzani, Mourad; Papotti, Paolo; Quiané-Ruiz, Jorge-Arnulfo; Tang, Nan; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B+-tree, R*-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space efficient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Publisher:
VLDB Endowment
Journal:
Proceedings of the VLDB Endowment
Conference/Event name:
Proceedings of the VLDB Endowment - Proceedings of the 41st International Conference on Very Large Data Bases
Issue Date:
1-Sep-2015
DOI:
10.14778/2831360.2831362
Type:
Conference Paper
ISSN:
Lightning fast and space efficient inequality joins 2015, 8 (13):2074 Proceedings of the VLDB Endowment; 21508097
Additional Links:
http://dl.acm.org/citation.cfm?doid=2831360.2831362
Appears in Collections:
Conference Papers; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorKhayyat, Zuhairen
dc.contributor.authorLucia, Williamen
dc.contributor.authorSingh, Meghnaen
dc.contributor.authorOuzzani, Mouraden
dc.contributor.authorPapotti, Paoloen
dc.contributor.authorQuiané-Ruiz, Jorge-Arnulfoen
dc.contributor.authorTang, Nanen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2016-01-10T10:24:55Zen
dc.date.available2016-01-10T10:24:55Zen
dc.date.issued2015-09-01en
dc.identifier.issnLightning fast and space efficient inequality joins 2015, 8 (13):2074 Proceedings of the VLDB Endowmenten
dc.identifier.issn21508097en
dc.identifier.doi10.14778/2831360.2831362en
dc.identifier.urihttp://hdl.handle.net/10754/593180en
dc.description.abstractInequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B+-tree, R*-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space efficient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.en
dc.publisherVLDB Endowmenten
dc.relation.urlhttp://dl.acm.org/citation.cfm?doid=2831360.2831362en
dc.rightsThis work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license.en
dc.titleLightning fast and space efficient inequality joinsen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalProceedings of the VLDB Endowmenten
dc.conference.dateSeptember 5–9, 2015en
dc.conference.nameProceedings of the VLDB Endowment - Proceedings of the 41st International Conference on Very Large Data Basesen
dc.conference.locationKohala Coast, Hawaiien
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionQatar Computing Research Instituteen
kaust.authorKhayyat, Zuhairen
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.