Show simple item record

dc.contributor.advisorZhang, Xiangliang
dc.contributor.authorWerfelmann, Robert
dc.date.accessioned2018-05-24T13:12:20Z
dc.date.available2018-05-24T13:12:20Z
dc.date.issued2018-05-24
dc.identifier.doi10.25781/KAUST-N008I
dc.identifier.urihttp://hdl.handle.net/10754/627954
dc.description.abstractNative Language Identification (NLI) is the task of predicting the native language of an author from their text written in a second language. The idea is to find writing habits that transfer from an author’s native language to their second language. Many approaches to this task have been studied, from simple word frequency analysis, to analyzing grammatical and spelling mistakes to find patterns and traits that are common between different authors of the same native language. This can be a very complex task, depending on the native language and the proficiency of the author’s second language. The most common approach that has seen very good results is based on the usage of n-gram features of words and characters. In this thesis, we attempt to extract lexical, grammatical, and semantic features from the sentences of non-native English essays using neural networks. The training and testing data was obtained from a large corpus of publicly available essays written by authors of several countries around the world. The neural network models consisted of Long Short-Term Memory and Convolutional networks using the sentences of each document as the input. Additional statistical features were generated from the text to complement the predictions of the neural networks, which were then used as feature inputs to a Support Vector Machine, making the final prediction. Results show that Long Short-Term Memory neural network can improve performance over a naive bag of words approach, but with a much smaller feature set. With more fine-tuning of neural network hyperparameters, these results will likely improve significantly.
dc.language.isoen
dc.subjectnative language identification
dc.subjectNatural language processing
dc.subjectMachine learning
dc.subjectClassification
dc.subjectneural networks
dc.titleA Study of Recurrent and Convolutional Neural Networks in the Native Language Identification Task
dc.typeThesis
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberGao, Xin
thesis.degree.disciplineComputer Science
thesis.degree.nameMaster of Science
refterms.dateFOA2018-06-13T13:22:22Z


Files in this item

Thumbnail
Name:
Thesis2018.pdf
Size:
547.1Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record