Show simple item record

dc.contributor.advisorBajic, Vladimir B.*
dc.contributor.authorArkasosy, Basil*
dc.date.accessioned2013-06-03T18:43:39Z
dc.date.available2013-06-03T18:43:39Z
dc.date.issued2013-05-11en
dc.identifier.doi10.25781/KAUST-N0Q62
dc.identifier.urihttp://hdl.handle.net/10754/293325en
dc.description.abstractAmbiguity in texts is a well-known problem: words can carry several meanings, and hence, can be read and interpreted differently. This is also true in the biological literature; names of biological concepts, such as genes and proteins, might be ambiguous, referring in some cases to more than one gene or one protein, or in others, to both genes and proteins at the same time. Public biological databases give a very useful insight about genes and proteins information, including their names. In this study, we made a thorough analysis of the nomenclatures of genes and proteins in two data sources and for six different species. We developed an automated process that parses, extracts, processes and stores information available in two major biological databases: Entrez Gene and UniProtKB. We analysed gene and protein synonyms, their types, frequencies, and the ambiguities within a species, in between data sources and cross-species. We found that at least 40% of the cross-species ambiguities are caused by names that are already ambiguous within the species. Our study shows that from the six species we analysed (Homo Sapiens, Mus Musculus, Arabidopsis Thaliana, Oryza Sativa, Bacillus Subtilis and Pseudomonas Fluorescens), rice (Oriza Sativa) has the best naming model in Entrez Gene database, with low ambiguities between data sources and cross-species.en
dc.language.isoenen
dc.subjectanalysisen
dc.subjectgenesen
dc.subjectname synonymsen
dc.subjectentrez geneen
dc.subjectproteinen
dc.subjectUniProtKBen
dc.titleAnalysis of gene and protein name synonyms in Entrez Gene and UniProtKB resourcesen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division*
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberMoshkov, Mikhail*
dc.contributor.committeememberZhang, Xiangliang*
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
refterms.dateFOA2014-05-11T00:00:00Z


Files in this item

Thumbnail
Name:
Master Thesis.pdf
Size:
4.350Mb
Format:
PDF
Description:
Basil Arkasosy Thesis

This item appears in the following Collection(s)

Show simple item record