CrowdAidRepair: A Crowd-Aided Interactive Data Repairing Method

Handle URI:
http://hdl.handle.net/10754/611378
Title:
CrowdAidRepair: A Crowd-Aided Interactive Data Repairing Method
Authors:
Zhou, Jian; Li, Zhixu; Gu, Binbin; Xie, Qing ( 0000-0003-4530-588X ) ; Zhu, Jia; Zhang, Xiangliang ( 0000-0002-3574-5665 ) ; Li, Guoliang
Abstract:
Data repairing aims at discovering and correcting erroneous data in databases. Traditional methods relying on predefined quality rules to detect the conflict between data may fail to choose the right way to fix the detected conflict. Recent efforts turn to use the power of crowd in data repairing, but the crowd power has its own drawbacks such as high human intervention cost and inevitable low efficiency. In this paper, we propose a crowd-aided interactive data repairing method which takes the advantages of both rule-based method and crowd-based method. Particularly, we investigate the interaction between crowd-based repairing and rule-based repairing, and show that by doing crowd-based repairing to a small portion of values, we can greatly improve the repairing quality of the rule-based repairing method. Although we prove that the optimal interaction scheme using the least number of values for crowd-based repairing to maximize the imputation recall is not feasible to be achieved, still, our proposed solution identifies an efficient scheme through investigating the inconsistencies and the dependencies between values in the repairing process. Our empirical study on three data collections demonstrates the high repairing quality of CrowdAidRepair, as well as the efficiency of the generated interaction scheme over baselines.
KAUST Department:
Computer Science Program
Citation:
Zhou, J., Li, Z., Gu, B., Xie, Q., Zhu, J., Zhang, X. and Li, G., 2016, April. CrowdAidRepair: A Crowd-Aided Interactive Data Repairing Method. In Database Systems for Advanced Applications (pp. 51-66). Springer International Publishing.
Publisher:
Springer Science + Business Media
Journal:
Database Systems for Advanced Applications
Issue Date:
25-Mar-2016
DOI:
10.1007/978-3-319-32025-0_4
Type:
Book Chapter
ISSN:
0302-9743
ISBN:
978-3-319-32024-3
Sponsors:
This research is partially supported by Natural Science Foundation of China (Grant No. 61303019, 61402313, 61472263, 61572336), Postdoctoral scientific research funding of Jiangsu Province (No. 1501090B) National 58 batch of postdoctoral funding (No. 2015M581859) and Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China.
Additional Links:
http://link.springer.com/chapter/10.1007%2F978-3-319-32025-0_4
Appears in Collections:
Computer Science Program; Book Chapters

Full metadata record

DC FieldValue Language
dc.contributor.authorZhou, Jianen
dc.contributor.authorLi, Zhixuen
dc.contributor.authorGu, Binbinen
dc.contributor.authorXie, Qingen
dc.contributor.authorZhu, Jiaen
dc.contributor.authorZhang, Xiangliangen
dc.contributor.authorLi, Guoliangen
dc.date.accessioned2016-06-01T12:27:26Z-
dc.date.available2016-06-01T12:27:26Z-
dc.date.issued2016-03-25-
dc.identifier.citationZhou, J., Li, Z., Gu, B., Xie, Q., Zhu, J., Zhang, X. and Li, G., 2016, April. CrowdAidRepair: A Crowd-Aided Interactive Data Repairing Method. In Database Systems for Advanced Applications (pp. 51-66). Springer International Publishing.en
dc.identifier.isbn978-3-319-32024-3-
dc.identifier.issn0302-9743-
dc.identifier.doi10.1007/978-3-319-32025-0_4-
dc.identifier.urihttp://hdl.handle.net/10754/611378-
dc.description.abstractData repairing aims at discovering and correcting erroneous data in databases. Traditional methods relying on predefined quality rules to detect the conflict between data may fail to choose the right way to fix the detected conflict. Recent efforts turn to use the power of crowd in data repairing, but the crowd power has its own drawbacks such as high human intervention cost and inevitable low efficiency. In this paper, we propose a crowd-aided interactive data repairing method which takes the advantages of both rule-based method and crowd-based method. Particularly, we investigate the interaction between crowd-based repairing and rule-based repairing, and show that by doing crowd-based repairing to a small portion of values, we can greatly improve the repairing quality of the rule-based repairing method. Although we prove that the optimal interaction scheme using the least number of values for crowd-based repairing to maximize the imputation recall is not feasible to be achieved, still, our proposed solution identifies an efficient scheme through investigating the inconsistencies and the dependencies between values in the repairing process. Our empirical study on three data collections demonstrates the high repairing quality of CrowdAidRepair, as well as the efficiency of the generated interaction scheme over baselines.en
dc.description.sponsorshipThis research is partially supported by Natural Science Foundation of China (Grant No. 61303019, 61402313, 61472263, 61572336), Postdoctoral scientific research funding of Jiangsu Province (No. 1501090B) National 58 batch of postdoctoral funding (No. 2015M581859) and Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu, China.en
dc.language.isoenen
dc.publisherSpringer Science + Business Mediaen
dc.relation.urlhttp://link.springer.com/chapter/10.1007%2F978-3-319-32025-0_4en
dc.rightsThe final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-32025-0_4en
dc.titleCrowdAidRepair: A Crowd-Aided Interactive Data Repairing Methoden
dc.typeBook Chapteren
dc.contributor.departmentComputer Science Programen
dc.identifier.journalDatabase Systems for Advanced Applicationsen
dc.eprint.versionPost-printen
dc.contributor.institutionSchool of Computer Science and Technology, Soochow University, Suzhou, Chinaen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorZhang, Xiangliangen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.