An unsupervised learning approach for NER based on online encyclopedia
Name:
An Unsupervised Learning Approach for NER based on Online Encyclopedia.pdf
Size:
687.5Kb
Format:
PDF
Description:
Accepted manuscript
Type
Conference PaperDate
2019-07-18Online Publication Date
2019-07-18Print Publication Date
2019Permanent link to this record
http://hdl.handle.net/10754/656840
Metadata
Show full item recordAbstract
Named Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on a large amount of high-quality annotated data, which is quite expensive to obtain. Various existing ways have been proposed to reduce the heavy reliance on large training data, but only with limited effect. In this paper, we propose a novel way to make full use of the weakly-annotated texts in encyclopedia pages for exactly unsupervised NER learning, which is expected to provide an opportunity to train the NER model with no manually-labeled data at all. Briefly, we roughly divide the sentences of encyclopedia pages into two parts simply according to the density of inner url links contained in each sentence. While a relatively small number of sentences with dense links are used directly for training the NER model initially, the left sentences with sparse links are then smartly selected for gradually promoting the model in several self-training iterations. Given the limited number of sentences with dense links for training, a data augmentation method is proposed, which could generate a lot more training data with the help of the structured data of encyclopedia to greatly augment the training effect. Besides, in the iterative self-training step, we propose to utilize a graph model to help estimate the labeled quality of these sentences with sparse links, among which those with the highest labeled quality would be put into our training set for updating the model in the next iteration. Our empirical study shows that the NER model trained with our unsupervised learning approach could perform even better than several state-of-art models fully trained on newswires data.Citation
Li, M., Yang, Q., He, F., Li, Z., Zhao, P., Zhao, L., & Chen, Z. (2019). An Unsupervised Learning Approach for NER Based on Online Encyclopedia. Lecture Notes in Computer Science, 329–344. doi:10.1007/978-3-030-26072-9_25Sponsors
This research is partially supported by National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61572335, 61772356), and the Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003, 18KJA520010).Publisher
Springer International PublishingConference/Event name
3rd APWeb and WAIM Joint Conference on Web and Big Data, APWeb-WAIM 2019Additional Links
http://link.springer.com/10.1007/978-3-030-26072-9_25ae974a485f413a2113503eed53cd6c53
10.1007/978-3-030-26072-9_25