Show simple item record

dc.contributor.advisorBajic, Vladimir B.
dc.contributor.authorArkasosy, Basil
dc.date.accessioned2013-06-03T18:43:39Z
dc.date.available2013-06-03T18:43:39Z
dc.date.issued2013-05-11
dc.identifier.doi10.25781/KAUST-N0Q62
dc.identifier.urihttp://hdl.handle.net/10754/293325
dc.description.abstractAmbiguity in texts is a well-known problem: words can carry several meanings, and hence, can be read and interpreted differently. This is also true in the biological literature; names of biological concepts, such as genes and proteins, might be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names. In this study, we made a thorough analysis of the nomenclatures of genes and proteins in two data sources and for six different species. We developed an automated process that parses, extracts, processes and stores information available in two major biological databases: Entrez Gene and UniProtKB. We analysed gene and protein synonyms, their types, frequencies, and the ambiguities within a species, in between data sources and cross-species. We found that at least 40% of the cross-species ambiguities are caused by names that are already ambiguous within the species. Our study shows that from the six species we analysed (Homo Sapiens, Mus Musculus, Arabidopsis Thaliana, Oryza Sativa, Bacillus Subtilis and Pseudomonas Fluorescens), rice (Oriza Sativa) has the best naming model in Entrez Gene database, with low ambiguities between data sources and cross-species.
dc.language.isoen
dc.subjectanalysis
dc.subjectgenes
dc.subjectname synonyms
dc.subjectentrez gene
dc.subjectprotein
dc.subjectUniProtKB
dc.titleAnalysis of gene and protein name synonyms in Entrez Gene and UniProtKB resources
dc.typeThesis
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberZhang, Xiangliang
thesis.degree.disciplineComputer Science
thesis.degree.nameMaster of Science
refterms.dateFOA2014-05-11T00:00:00Z


Files in this item

Thumbnail
Name:
Master Thesis.pdf
Size:
4.350Mb
Format:
PDF
Description:
Basil Arkasosy Thesis

This item appears in the following Collection(s)

Show simple item record