Quality and efficiency in high dimensional Nearest neighbor search

Handle URI:
http://hdl.handle.net/10754/564236
Title:
Quality and efficiency in high dimensional Nearest neighbor search
Authors:
Tao, Yufei; Yi, Ke; Sheng, Cheng; Kalnis, Panos ( 0000-0002-5060-1360 )
Abstract:
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast highdimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality. © 2009 ACM.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
Association for Computing Machinery (ACM)
Journal:
Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09
Conference/Event name:
International Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09
Issue Date:
2009
DOI:
10.1145/1559845.1559905
Type:
Conference Paper
ISBN:
9781605585543
Appears in Collections:
Conference Papers; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorTao, Yufeien
dc.contributor.authorYi, Keen
dc.contributor.authorSheng, Chengen
dc.contributor.authorKalnis, Panosen
dc.date.accessioned2015-08-04T06:20:08Zen
dc.date.available2015-08-04T06:20:08Zen
dc.date.issued2009en
dc.identifier.isbn9781605585543en
dc.identifier.doi10.1145/1559845.1559905en
dc.identifier.urihttp://hdl.handle.net/10754/564236en
dc.description.abstractNearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast highdimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality. © 2009 ACM.en
dc.publisherAssociation for Computing Machinery (ACM)en
dc.subjectLocality sensitive hashingen
dc.subjectNearest neighbor searchen
dc.titleQuality and efficiency in high dimensional Nearest neighbor searchen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.identifier.journalProceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09en
dc.conference.date29 June 2009 through 2 July 2009en
dc.conference.nameInternational Conference on Management of Data and 28th Symposium on Principles of Database Systems, SIGMOD-PODS'09en
dc.conference.locationProvidence, RIen
dc.contributor.institutionChinese University of Hong Kong, Sha Tin, New Territories, Hong Kongen
dc.contributor.institutionHong Kong University of Science and Technology, Clear Water Bay, Hong Kongen
kaust.authorKalnis, Panosen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.