Genetic Algorithms for Models Optimization for Recognition of Translation Initiation Sites

Handle URI:
http://hdl.handle.net/10754/136690
Title:
Genetic Algorithms for Models Optimization for Recognition of Translation Initiation Sites
Authors:
Mora, Arturo Magana
Abstract:
This work uses genetic algorithms (GA) to reduce the complexity of the artificial neural networks (ANNs) and decision trees (DTs) for the accurate recognition of translation initiation sites (TISs) in Arabidopsis Thaliana. The Arabidopsis data was extracted directly from genomic DNA sequences. Methods derived in this work resulted in both reduced complexity of the predictors, as well as in improvement in prediction accuracy (generalization). Optimization through use of GA is generally a computationally intensive task. One of the approaches to overcome this problem is to use parallelization of code that implements GA, thus allowing computation on multiprocessing infrastructure. However, further improvement in performance GA implementation could be achieved through modification done to GA basic operations such as selection, crossover and mutation. In this work we explored two such improvements, namely evolutive mutation and GA-Simplex crossover operation. In this thesis we studied the benefit of these modifications on the problem of TISs recognition. Compared to the non-modified GA approach, we reduced the number of weights in the resulting model's neural network component by 51% and the number of nodes in the model's DTs component by 97% whilst improving the model's accuracy at the same time. Separately, we developed another methodology for reducing the complexity of prediction models by optimizing the composition of training data subsets in bootstrap aggregation (bagging) methodology. This optimization is achieved by applying a new GA-based bagging methodology in order to optimize the composition of each of the training data subsets. This approach has shown in our test cases to considerably enhance the accuracy of the TIS prediction model compared to the original bagging methodology. Although these methods are applied to the problem of accurate prediction of TISs we believe that these methodologies have a potential for wider scope of application.
Advisors:
Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Committee Member:
Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Zhang, Xiangliang ( 0000-0002-3574-5665 )
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Program:
Computer Science
Issue Date:
Jun-2011
Type:
Thesis
Appears in Collections:
Theses; Computer Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.advisorBajic, Vladimir B.en
dc.contributor.authorMora, Arturo Maganaen
dc.date.accessioned2011-07-24T07:52:51Z-
dc.date.available2011-07-24T07:52:51Z-
dc.date.issued2011-06en
dc.identifier.urihttp://hdl.handle.net/10754/136690en
dc.description.abstractThis work uses genetic algorithms (GA) to reduce the complexity of the artificial neural networks (ANNs) and decision trees (DTs) for the accurate recognition of translation initiation sites (TISs) in Arabidopsis Thaliana. The Arabidopsis data was extracted directly from genomic DNA sequences. Methods derived in this work resulted in both reduced complexity of the predictors, as well as in improvement in prediction accuracy (generalization). Optimization through use of GA is generally a computationally intensive task. One of the approaches to overcome this problem is to use parallelization of code that implements GA, thus allowing computation on multiprocessing infrastructure. However, further improvement in performance GA implementation could be achieved through modification done to GA basic operations such as selection, crossover and mutation. In this work we explored two such improvements, namely evolutive mutation and GA-Simplex crossover operation. In this thesis we studied the benefit of these modifications on the problem of TISs recognition. Compared to the non-modified GA approach, we reduced the number of weights in the resulting model's neural network component by 51% and the number of nodes in the model's DTs component by 97% whilst improving the model's accuracy at the same time. Separately, we developed another methodology for reducing the complexity of prediction models by optimizing the composition of training data subsets in bootstrap aggregation (bagging) methodology. This optimization is achieved by applying a new GA-based bagging methodology in order to optimize the composition of each of the training data subsets. This approach has shown in our test cases to considerably enhance the accuracy of the TIS prediction model compared to the original bagging methodology. Although these methods are applied to the problem of accurate prediction of TISs we believe that these methodologies have a potential for wider scope of application.en
dc.language.isoenen
dc.titleGenetic Algorithms for Models Optimization for Recognition of Translation Initiation Sitesen
dc.typeThesisen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
thesis.degree.grantorKing Abdullah University of Science and Technologyen_GB
dc.contributor.committeememberMoshkov, Mikhailen
dc.contributor.committeememberZhang, Xiangliangen
thesis.degree.disciplineComputer Scienceen
thesis.degree.nameMaster of Scienceen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.