Automated Extraction Of Associations Between Methylated Genes and Diseases From Biomedical Literature
AuthorsBin Res, Arwa A.
AdvisorsBajic, Vladimir B.
Embargo End Date2013-12-02
Permanent link to this recordhttp://hdl.handle.net/10754/264674
MetadataShow full item record
Access RestrictionsAt the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis became available to the public after the expiration of the embargo on 2013-12-02.
AbstractAssociations between methylated genes and diseases have been investigated in several studies, and it is critical to have such information available for better understanding of diseases and clinical decisions. However, such information is scattered in a large number of electronic publications and it is difficult to manually search for it. Therefore, the goal of the project is to develop a machine learning model that can efficiently extract such information. Twelve machine learning algorithms were applied and compared in application to this problem based on three approaches that involve: document-term frequency matrices, position weight matrices, and a hybrid approach that uses the combination of the previous two. The best results we obtained by the hybrid approach with a random forest model that, in a 10-fold cross-validation, achieved F-score and accuracy of nearly 85% and 84%, respectively. On a completely separate testing set, F-score and accuracy of 89% and 88%, respectively, were obtained. Based on this model, we developed a tool that automates extraction of associations between methylated genes and diseases from electronic text. Our study contributed an efficient method for extracting specific types of associations from free text and the methodology developed here can be extended to other similar association extraction problems.