Show simple item record

dc.contributor.authorLi, Zhixu
dc.contributor.authorQin, Lu
dc.contributor.authorCheng, Hong
dc.contributor.authorZhang, Xiangliang
dc.contributor.authorZhou, Xiaofang
dc.date.accessioned2018-11-22T10:06:45Z
dc.date.available2018-11-22T10:06:45Z
dc.date.issued2015-03-09
dc.identifier.citationLi Z, Qin L, Cheng H, Zhang X, Zhou X (2015) TRIP: An Interactive Retrieving-Inferring Data Imputation Approach. IEEE Transactions on Knowledge and Data Engineering 27: 2550–2563. Available: http://dx.doi.org/10.1109/TKDE.2015.2411276.
dc.identifier.issn1041-4347
dc.identifier.doi10.1109/TKDE.2015.2411276
dc.identifier.urihttp://hdl.handle.net/10754/630001
dc.description.abstractData imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving-based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably bring a large overhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving-based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an inTeractive Retrieving-Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a data set. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in τ-constrained stochastic data imputation (τ-SDI), but still, our proposed solution identifies an expected-optimal scheme in τ-SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving-based approach.
dc.description.sponsorshipThis work was supported in part by the Natural Science Foundation of China under Grants 61472263, 61402313, 61303019, the Australian Research Council under Grants DP140103171, DE140100999, the Hong Kong Research Grants Council (General Research Fund (GRF) Project CUHK 411211), the Chinese University of Hong Kong (Direct Grants 4055015, 4055048), and the King Abdullah University of Science and Technology.
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectData Imputation
dc.subjectData Repairing
dc.subjectInteractive Retrieving-Inferring
dc.titleTRIP: An Interactive Retrieving-Inferring Data Imputation Approach
dc.typeArticle
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science Program
dc.identifier.journalIEEE Transactions on Knowledge and Data Engineering
dc.contributor.institutionSchool of Computer Science and Technology, Soochow University, Suzhou, China
dc.contributor.institutionUniversity of Technology, Sydney, NSW, Australia
dc.contributor.institutionChinese University of Hong Kong, Hong Kong
dc.contributor.institutionSchool of Information Technology and Electrical Engineering, University of Queensland, Brisbane, QLD, Australia
kaust.personZhang, Xiangliang
dc.date.published-online2015-03-09
dc.date.published-print2015-09-01


This item appears in the following Collection(s)

Show simple item record