Robust hybrid name disambiguation framework for large databases

Handle URI:
http://hdl.handle.net/10754/563050
Title:
Robust hybrid name disambiguation framework for large databases
Authors:
Zhu, Jia; Yang, Yi; Xie, Qing ( 0000-0003-4530-588X ) ; Wang, Liwei; Hassan, Saeed-Ul
Abstract:
In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors' name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible. © 2013 Akadémiai Kiadó, Budapest, Hungary.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program
Publisher:
Springer Nature
Journal:
Scientometrics
Issue Date:
26-Oct-2013
DOI:
10.1007/s11192-013-1151-0
Type:
Article
ISSN:
01389130
Appears in Collections:
Articles; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorZhu, Jiaen
dc.contributor.authorYang, Yien
dc.contributor.authorXie, Qingen
dc.contributor.authorWang, Liweien
dc.contributor.authorHassan, Saeed-Ulen
dc.date.accessioned2015-08-03T11:34:37Zen
dc.date.available2015-08-03T11:34:37Zen
dc.date.issued2013-10-26en
dc.identifier.issn01389130en
dc.identifier.doi10.1007/s11192-013-1151-0en
dc.identifier.urihttp://hdl.handle.net/10754/563050en
dc.description.abstractIn many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors' name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible. © 2013 Akadémiai Kiadó, Budapest, Hungary.en
dc.publisherSpringer Natureen
dc.subjectClusteringen
dc.subjectGenre identificationen
dc.subjectMultidimensional scalingen
dc.subjectName disambiguationen
dc.titleRobust hybrid name disambiguation framework for large databasesen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.identifier.journalScientometricsen
dc.contributor.institutionSchool of Computer Science, South China Normal University, Guangzhou, Chinaen
dc.contributor.institutionSchool of Computer Science, Carnegie Mellon University, Pittsburgh, PA, United Statesen
dc.contributor.institutionWuhan University, Wuhan, Chinaen
dc.contributor.institutionCOMSATS Institute of Information Technology, Lahore, Pakistanen
kaust.authorXie, Qingen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.