Lightning Fast and Space Efficient Inequality Joins

Handle URI:
http://hdl.handle.net/10754/581345
Title:
Lightning Fast and Space Efficient Inequality Joins
Authors:
Khayyat, Zuhair; Lucia, William; Singh, Meghna; Ouzzani, Mourad; Papotti, Paolo; Quiane-Ruiz, Jorge-Arnulfo; Tang, Nan; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B+-tree,R*-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space effcient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Journal:
Proceedings of the VLDB Endowment
Conference/Event name:
The 41st International Conference on Very Large Data Bases
Issue Date:
Sep-2015
Type:
Conference Paper
Additional Links:
http://www.vldb.org/pvldb/vol8/p2074-khayyat.pdf
Appears in Collections:
Conference Papers; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorKhayyat, Zuhairen
dc.contributor.authorLucia, Williamen
dc.contributor.authorSingh, Meghnaen
dc.contributor.authorOuzzani, Mouraden
dc.contributor.authorPapotti, Paoloen
dc.contributor.authorQuiane-Ruiz, Jorge-Arnulfoen
dc.contributor.authorTang, Nanen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2015-10-28T13:40:12Zen
dc.date.available2015-10-28T13:40:12Zen
dc.date.issued2015-09en
dc.identifier.urihttp://hdl.handle.net/10754/581345en
dc.description.abstractInequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B+-tree,R*-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space effcient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster.en
dc.relation.urlhttp://www.vldb.org/pvldb/vol8/p2074-khayyat.pdfen
dc.rightsThis work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license.en
dc.titleLightning Fast and Space Efficient Inequality Joinsen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalProceedings of the VLDB Endowmenten
dc.conference.dateSeptember 2015en
dc.conference.nameThe 41st International Conference on Very Large Data Basesen
dc.conference.locationKahola Coast, Hawaii.en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionQatar Computing Research Instituteen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.