Efficient locality-sensitive hashing over high-dimensional streaming data
dc.contributor.author | Wang, Hao | |
dc.contributor.author | Yang, Chengcheng | |
dc.contributor.author | Zhang, Xiangliang | |
dc.contributor.author | Gao, Xin | |
dc.date.accessioned | 2020-09-27T04:47:26Z | |
dc.date.available | 2020-09-27T04:47:26Z | |
dc.date.issued | 2020-09-17 | |
dc.date.submitted | 2020-07-20 | |
dc.identifier.citation | Wang, H., Yang, C., Zhang, X., & Gao, X. (2020). Efficient locality-sensitive hashing over high-dimensional streaming data. Neural Computing and Applications. doi:10.1007/s00521-020-05336-1 | |
dc.identifier.issn | 1433-3058 | |
dc.identifier.issn | 0941-0643 | |
dc.identifier.doi | 10.1007/s00521-020-05336-1 | |
dc.identifier.uri | http://hdl.handle.net/10754/665289 | |
dc.description.abstract | Approximate nearest neighbor (ANN) search in high-dimensional spaces is fundamental in many applications. Locality-sensitive hashing (LSH) is a well-known methodology to solve the ANN problem. Existing LSH-based ANN solutions typically employ a large number of individual indexes optimized for searching efficiency. Updating such indexes might be impractical when processing high-dimensional streaming data. In this paper, we present a novel disk-based LSH index that offers efficient support for both searches and updates. The contributions of our work are threefold. First, we use the write-friendly LSM-trees to store the LSH projections to facilitate efficient updates. Second, we develop a novel estimation scheme to estimate the number of required LSH functions, with which the disk storage and access costs are effectively reduced. Third, we exploit both the collision number and the projection distance to improve the efficiency of candidate selection, improving the search performance with theoretical guarantees on the result quality. Experiments on four real-world datasets show that our proposal outperforms the state-of-the-art schemes. | |
dc.description.sponsorship | The authors would like to thank the editor and anonymous reviewers for their valuable suggestions and comments. This work was funded in part by the Center of Excellence for NEOM Research at KAUST, REI/1/4178-01-01, the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under award numbers BAS/1/1624-01, REI/1/0018-01-01, REI/1/4216-01-01, REI/1/4437-01-01, and REI/1/4473-01-01. | |
dc.publisher | Springer Science and Business Media LLC | |
dc.relation.url | http://link.springer.com/10.1007/s00521-020-05336-1 | |
dc.rights | Archived with thanks to Neural Computing and Applications | |
dc.title | Efficient locality-sensitive hashing over high-dimensional streaming data | |
dc.type | Article | |
dc.contributor.department | Computational Bioscience Research Center (CBRC) | |
dc.contributor.department | Computer Science Program | |
dc.contributor.department | Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division | |
dc.contributor.department | Machine Intelligence & kNowledge Engineering Lab | |
dc.contributor.department | Machine Intelligence and kNowledge Engineering Laboratory, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia | |
dc.contributor.department | Structural and Functional Bioinformatics Group | |
dc.identifier.journal | Neural Computing and Applications | |
dc.rights.embargodate | 2021-09-17 | |
dc.eprint.version | Post-print | |
dc.contributor.institution | Shenzhen University, Shenzhen, China | |
kaust.person | Wang, Hao | |
kaust.person | Yang, Chengcheng | |
kaust.person | Zhang, Xiangliang | |
kaust.person | Gao, Xin | |
kaust.grant.number | BAS/1/1624 | |
dc.date.accepted | 2020-09-02 | |
dc.identifier.eid | 2-s2.0-85091161860 | |
refterms.dateFOA | 2020-12-09T13:36:53Z | |
kaust.acknowledged.supportUnit | Center of Excellence for NEOM Research | |
kaust.acknowledged.supportUnit | Office of Sponsored Research (OSR) |
Files in this item
This item appears in the following Collection(s)
-
Articles
-
Structural and Functional Bioinformatics Group
For more information visit: https://sfb.kaust.edu.sa/Pages/Home.aspx -
Computer Science Program
For more information visit: https://cemse.kaust.edu.sa/cs -
Computational Bioscience Research Center (CBRC)
-
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/