Show simple item record

dc.contributor.advisorBajic, Vladimir B.
dc.contributor.authorMulamba, Pierre Abraham
dc.date.accessioned2014-12-07T13:52:27Z
dc.date.available2015-12-07T00:00:00Z
dc.date.issued2014-12
dc.identifier.doi10.25781/KAUST-VM9KK
dc.identifier.urihttp://hdl.handle.net/10754/336791
dc.description.abstractThe challenge in finding genes in eukaryotic organisms using computational methods is an ongoing problem in the biology. Based on various genomic signals found in eukaryotic genomes, this problem can be divided into many different sub­-problems such as identification of transcription start sites, translation initiation sites, splice sites, poly (A) signals, etc. Each sub-­problem deals with a particular type of genomic signals and various computational methods are used to solve each sub-­problem. Aggregating information from all these individual sub-­problems can lead to a complete annotation of a gene and its component signals. The fundamental principle of most of these computational methods is the mapping principle – building an input-­output model for the prediction of a particular genomic signal based on a set of known input signals and their corresponding output signal. The type of input signals used to build the model is an essential element in most of these computational methods. The common factor of most of these methods is that they are mainly based on the statistical analysis of the basic nucleotide sequence string composition. 4 Our study is based on a novel approach to predict genomic signals in which uniquely generated structural profiles that combine compressed physicochemical properties with topological and compositional properties of DNA sequences are used to develop machine learning predictive models. The compression of the physicochemical properties is made using principal component analysis transformation. Our ideas are evaluated through prediction models of canonical splice sites using support vector machine models. We demonstrate across several species that the proposed methodology has resulted in the most accurate splice site predictors that are publicly available or described. We believe that the approach in this study is quite general and has various applications in other biological modeling problems.
dc.language.isoen
dc.subjectPhysicochemical
dc.subjectCompositional
dc.subjectCharacteristics
dc.subjectPrediction
dc.subjectGenomic
dc.subjectSignals
dc.titleUsing physicochemical and compositional characteristics of DNA sequence for prediction of genomic signals
dc.typeDissertation
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Division
dc.rights.embargodate2015-12-07
thesis.degree.grantorKing Abdullah University of Science and Technology
dc.contributor.committeememberMoshkov, Mikhail
dc.contributor.committeememberArold, Stefan T.
dc.contributor.committeememberChristoffels, Alan
thesis.degree.disciplineBioscience
thesis.degree.nameDoctor of Philosophy
dc.rights.accessrightsAt the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation became available to the public after the expiration of the embargo on 2015-12-07.
refterms.dateFOA2015-12-07T00:00:00Z


Files in this item

Thumbnail
Name:
Pierre Mulamba Mutombo Dissertation.pdf
Size:
4.266Mb
Format:
PDF
Description:
Pierre Mutombo Dissertation

This item appears in the following Collection(s)

Show simple item record