Unsupervised Mitigation of Gender Bias by Character Components: A Case Study of Chinese Word Embedding
Type
Conference PaperKAUST Department
Computational Bioscience Reseach Center, KAUSTComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Machine Intelligence & kNowledge Engineering Lab
Structural and Functional Bioinformatics Group
Date
2022-07-15Permanent link to this record
http://hdl.handle.net/10754/681590
Metadata
Show full item recordAbstract
Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases. However, debiasing on the Chinese language, one of the most spoken languages, has been less explored. Meanwhile, existing literature relies on manually created supplementary data, which is time- and energy-consuming. In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data. Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure. This consequently alleviates discriminative gender biases. Experimental results show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.Citation
Chen, X., Li, M., Yan, R., Gao, X., & Zhang, X. (2022). Unsupervised Mitigating Gender Bias by Character Components: A Case Study of Chinese Word Embedding. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP). https://doi.org/10.18653/v1/2022.gebnlp-1.14Sponsors
We would like to thank the anonymous reviewers for their constructive comments. The work was supported by King Abdullah University of Science and Technology (KAUST) through grant awards Nos. BAS/1/1624-01, FCC/1/1976-18-01, FCC/1/1976-23-01, FCC/1/1976-25-01, FCC/1/1976-26-01.Conference/Event name
4th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2022ISBN
9781955917681Additional Links
https://aclanthology.org/2022.gebnlp-1.14https://aclanthology.org/2022.gebnlp-1.14.pdf
ae974a485f413a2113503eed53cd6c53
10.18653/v1/2022.gebnlp-1.14
Scopus Count
Except where otherwise noted, this item's license is described as Archived with thanks to Association for Computational Linguistics under a Creative Commons license, details at: https://creativecommons.org/licenses/by/4.0/