Show simple item record

dc.contributor.authorLi, Maolong
dc.contributor.authorYang, Qiang
dc.contributor.authorHe, Fuzhen
dc.contributor.authorLi, Zhixu
dc.contributor.authorZhao, Pengpeng
dc.contributor.authorZhao, Lei
dc.contributor.authorChen, Zhigang
dc.date.accessioned2019-10-02T11:45:28Z
dc.date.available2019-10-02T11:45:28Z
dc.date.issued2019-07-18
dc.identifier.citationLi, M., Yang, Q., He, F., Li, Z., Zhao, P., Zhao, L., & Chen, Z. (2019). An Unsupervised Learning Approach for NER Based on Online Encyclopedia. Lecture Notes in Computer Science, 329–344. doi:10.1007/978-3-030-26072-9_25
dc.identifier.doi10.1007/978-3-030-26072-9_25
dc.identifier.urihttp://hdl.handle.net/10754/656840
dc.description.abstractNamed Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on a large amount of high-quality annotated data, which is quite expensive to obtain. Various existing ways have been proposed to reduce the heavy reliance on large training data, but only with limited effect. In this paper, we propose a novel way to make full use of the weakly-annotated texts in encyclopedia pages for exactly unsupervised NER learning, which is expected to provide an opportunity to train the NER model with no manually-labeled data at all. Briefly, we roughly divide the sentences of encyclopedia pages into two parts simply according to the density of inner url links contained in each sentence. While a relatively small number of sentences with dense links are used directly for training the NER model initially, the left sentences with sparse links are then smartly selected for gradually promoting the model in several self-training iterations. Given the limited number of sentences with dense links for training, a data augmentation method is proposed, which could generate a lot more training data with the help of the structured data of encyclopedia to greatly augment the training effect. Besides, in the iterative self-training step, we propose to utilize a graph model to help estimate the labeled quality of these sentences with sparse links, among which those with the highest labeled quality would be put into our training set for updating the model in the next iteration. Our empirical study shows that the NER model trained with our unsupervised learning approach could perform even better than several state-of-art models fully trained on newswires data.
dc.description.sponsorshipThis research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61572335, 61772356), and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003, 18KJA520010).
dc.publisherSpringer Nature
dc.relation.urlhttp://link.springer.com/10.1007/978-3-030-26072-9_25
dc.rightsArchived with thanks to Springer International Publishing
dc.titleAn unsupervised learning approach for NER based on online encyclopedia
dc.typeConference Paper
dc.contributor.departmentKing Abdullah University of Science and Technology, Jeddah, Saudi Arabia
dc.conference.date2019-08-01 to 2019-08-03
dc.conference.name3rd APWeb and WAIM Joint Conference on Web and Big Data, APWeb-WAIM 2019
dc.conference.locationChengdu, CHN
dc.eprint.versionPre-print
dc.contributor.institutionInstitute of Artificial Intelligence, School of Computer Science and Technology, Soochow University, Suzhou, China
dc.contributor.institutionIFLYTEK Research, Suzhou, China
dc.contributor.institutionState Key Laboratory of Cognitive Intelligence, iFLYTEK, Hefei, China
kaust.personYang, Qiang
refterms.dateFOA2019-10-03T12:48:48Z
dc.date.published-online2019-07-18
dc.date.published-print2019


Files in this item

Thumbnail
Name:
An Unsupervised Learning Approach for NER based on Online Encyclopedia.pdf
Size:
687.5Kb
Format:
PDF
Description:
Accepted manuscript

This item appears in the following Collection(s)

Show simple item record