A Study of Recurrent and Convolutional Neural Networks in the Native Language Identification Task
Type
ThesisAuthors
Werfelmann, Robert
Advisors
Zhang, Xiangliang
Committee members
Moshkov, Mikhail
Gao, Xin

Program
Computer ScienceDate
2018-05-24Permanent link to this record
http://hdl.handle.net/10754/627954
Metadata
Show full item recordAbstract
Native Language Identification (NLI) is the task of predicting the native language of an author from their text written in a second language. The idea is to find writing habits that transfer from an author’s native language to their second language. Many approaches to this task have been studied, from simple word frequency analysis, to analyzing grammatical and spelling mistakes to find patterns and traits that are common between different authors of the same native language. This can be a very complex task, depending on the native language and the proficiency of the author’s second language. The most common approach that has seen very good results is based on the usage of n-gram features of words and characters. In this thesis, we attempt to extract lexical, grammatical, and semantic features from the sentences of non-native English essays using neural networks. The training and testing data was obtained from a large corpus of publicly available essays written by authors of several countries around the world. The neural network models consisted of Long Short-Term Memory and Convolutional networks using the sentences of each document as the input. Additional statistical features were generated from the text to complement the predictions of the neural networks, which were then used as feature inputs to a Support Vector Machine, making the final prediction. Results show that Long Short-Term Memory neural network can improve performance over a naive bag of words approach, but with a much smaller feature set. With more fine-tuning of neural network hyperparameters, these results will likely improve significantly.Citation
Werfelmann, R. (2018). A Study of Recurrent and Convolutional Neural Networks in the Native Language Identification Task. KAUST Research Repository. https://doi.org/10.25781/KAUST-N008Iae974a485f413a2113503eed53cd6c53
10.25781/KAUST-N008I