A crowd-efficient learning approach for NER based on online encyclopedia
Type
ArticleDate
2019-12-02Online Publication Date
2019-12-02Print Publication Date
2020-01Embargo End Date
2020-12-02Permanent link to this record
http://hdl.handle.net/10754/660566
Metadata
Show full item recordAbstract
Named Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on a large amount of high-quality annotated data, which is quite expensive to obtain. Various existing ways have been proposed to reduce the heavy reliance on large training data, but only with limited effect. In this paper, we propose a crowd-efficient learning approach for supervised NER learning by making full use of the online encyclopedia pages. In our approach, we first define three criteria (representativeness, informativeness, diversity) to help select a much smaller set of samples for crowd labeling. We then propose a data augmentation method, which could generate a lot more training data with the help of the structured knowledge of online encyclopedia to greatly augment the training effect. After conducting model training on the augmented sample set, we re-select some new samples for crowd labeling for model refinement. We perform the training and selection procedure iteratively until the model could not be further improved or the performance of the model meets our requirement. Our empirical study conducted on several real data collections shows that our approach could reduce 50% manual annotations with almost the same NER performance as the fully trained model.Citation
Li, M., Li, Z., Yang, Q., Chen, Z., Zhao, P., & Zhao, L. (2019). A crowd-efficient learning approach for NER based on online encyclopedia. World Wide Web. doi:10.1007/s11280-019-00736-3Sponsors
This research is partially supported by Natural Science Foundation of Jiangsu Province (No. BK20191420), National Natural Science Foundation of China (Grant No. 61632016, 61572336, 61572335, 61772356), Natural Science Research Project of Jiangsu Higher Education Institution (No. 17KJA520003, 18KJA520010), and the Open Program of Neusoft Corporation (No. SKLSAOP1801).Publisher
Springer NatureJournal
World Wide WebAdditional Links
http://link.springer.com/10.1007/s11280-019-00736-3ae974a485f413a2113503eed53cd6c53
10.1007/s11280-019-00736-3