Contributions to In Silico Genome Annotation

Handle URI:
http://hdl.handle.net/10754/626265
Title:
Contributions to In Silico Genome Annotation
Authors:
Kalkatawi, Manal M. ( 0000-0002-9820-4129 )
Abstract:
Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally, we focus on deriving a model capable of facilitating the functional annotation of prokaryotes. As far as we know, there is no fully automated system for detailed comparison of functional annotations generated by different methods. Hence, we developed BEACON, a method and supporting system that compares gene annotation from various methods to produce a more reliable and comprehensive annotation. Overall, our research contributed to different aspects of the genome annotation.
Advisors:
Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Committee Member:
Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Arold, Stefan T. ( 0000-0001-5278-0668 ) ; Zhang, Zhang
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
30-Nov-2017
Type:
Dissertation
Appears in Collections:
Dissertations

Full metadata record

DC FieldValue Language
dc.contributor.advisorBajic, Vladimir B.en
dc.contributor.authorKalkatawi, Manal M.en
dc.date.accessioned2017-12-03T08:34:06Z-
dc.date.available2017-12-03T08:34:06Z-
dc.date.issued2017-11-30-
dc.identifier.urihttp://hdl.handle.net/10754/626265-
dc.description.abstractGenome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally, we focus on deriving a model capable of facilitating the functional annotation of prokaryotes. As far as we know, there is no fully automated system for detailed comparison of functional annotations generated by different methods. Hence, we developed BEACON, a method and supporting system that compares gene annotation from various methods to produce a more reliable and comprehensive annotation. Overall, our research contributed to different aspects of the genome annotation.en
dc.language.isoenen
dc.subjectrecognitionen
dc.subjectpredictionen
dc.subjectgenomic signalsen
dc.subjectgenome annotationen
dc.subjectdeep learningen
dc.subjectgenomic regionsen
dc.titleContributions to In Silico Genome Annotationen
dc.typeDissertationen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen
dc.contributor.committeememberMoshkov, Mikhailen
dc.contributor.committeememberArold, Stefan T.en
dc.contributor.committeememberZhang, Zhangen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameDoctor of Philosophyen
dc.person.id101799en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.