The Interaction Between Schema Matching and Record Matching in Data Integration
dc.contributor.author | Gu, Binbin | |
dc.contributor.author | Li, Zhixu | |
dc.contributor.author | Zhang, Xiangliang | |
dc.contributor.author | Liu, An | |
dc.contributor.author | Liu, Guanfeng | |
dc.contributor.author | Zheng, Kai | |
dc.contributor.author | Zhao, Lei | |
dc.contributor.author | Zhou, Xiaofang | |
dc.date.accessioned | 2017-01-02T09:55:33Z | |
dc.date.available | 2017-01-02T09:55:33Z | |
dc.date.issued | 2016-09-20 | |
dc.identifier.citation | Gu B, Li Z, Zhang X, Liu A, Liu G, et al. (2017) The Interaction Between Schema Matching and Record Matching in Data Integration. IEEE Transactions on Knowledge and Data Engineering 29: 186–199. Available: http://dx.doi.org/10.1109/TKDE.2016.2611577. | |
dc.identifier.issn | 1041-4347 | |
dc.identifier.doi | 10.1109/TKDE.2016.2611577 | |
dc.identifier.uri | http://hdl.handle.net/10754/622610 | |
dc.description.abstract | Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately. | |
dc.description.sponsorship | This research is partially supported by the Natural Science Foundation of China (Grant Nos. 61303019, 61402313, 61402312, 61472263, 61572336, 61572335, 61532018, 61502324), the King Abdullah University of Science and Technology, the Australian Research Council (Grants No. DP120102829), the Natural Science Foundation of Jiangsu (Grant No. BK20151223), the Postdoctoral scientific research funding of Jiangsu Province (No. 1501090B), and the National Postdoc- toral Funding (Nos. 2015M581859, 2016T90493). | |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | |
dc.relation.url | http://ieeexplore.ieee.org/document/7572193/ | |
dc.subject | record matching | |
dc.subject | Data integration | |
dc.subject | schema matching | |
dc.title | The Interaction Between Schema Matching and Record Matching in Data Integration | |
dc.type | Article | |
dc.contributor.department | Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division | |
dc.contributor.department | Computer Science Program | |
dc.identifier.journal | IEEE Transactions on Knowledge and Data Engineering | |
dc.contributor.institution | School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu, China | |
dc.contributor.institution | School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia | |
kaust.person | Zhang, Xiangliang | |
dc.date.published-online | 2016-09-20 | |
dc.date.published-print | 2017-01-01 |
This item appears in the following Collection(s)
-
Articles
-
Computer Science Program
For more information visit: https://cemse.kaust.edu.sa/cs -
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/