Show simple item record

dc.contributor.advisorHoehndorf, Robert
dc.contributor.authorToonsi, Sumyyah
dc.date.accessioned2019-08-26T08:10:43Z
dc.date.available2019-08-26T08:10:43Z
dc.date.issued2019-08-25
dc.identifier.doi10.25781/KAUST-0D55A
dc.identifier.urihttp://hdl.handle.net/10754/656601
dc.description.abstractThe knowledge of a protein’s function is essential to many studies in molecular biology, genetic experiments and protein-protein interactions. The Gene Ontology (GO) captures gene products' functions in classes and establishes relationship between them. Manually annotating proteins with GO functions from the bio-medical litera- ture is a tedious process which calls for automation. We develop a novel, dictionary- based method to annotate proteins with functions from text. We extract text-based features from words matched against a dictionary of GO. Since classes are included upon any word match with their class description, the number of negative samples outnumbers the positive ones. To mitigate this imbalance, we apply strict rules before weakly labeling the dataset according to the curated annotations. Furthermore, we discard samples of low statistical evidence and train a logistic regression classifier. The results of a 5-fold cross-validation show a high precision of 91% and 96% accu- racy in the best performing fold. The worst fold showed a precision of 80% and an accuracy of 95%. We conclude by explaining how this method can be used for similar annotation problems.
dc.language.isoen
dc.subjectProtein function
dc.subjectGene Ontology
dc.subjectText Mining
dc.subjectBiomedical
dc.subjectAnnotation
dc.subjectAutomatic
dc.titleAutomatic Protein Function Annotation Through Text Mining
dc.typeThesis
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberBajic, Vladimir B.
thesis.degree.disciplineComputer Science
thesis.degree.nameMaster of Science
refterms.dateFOA2019-08-26T08:10:44Z
kaust.request.doiyes


Files in this item

Thumbnail
Name:
Thesis_Final_Sumyyah.pdf
Size:
393.7Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record