Genetic Algorithms for Models Optimization for Recognition of Translation Initiation Sites
Type
ThesisAuthors
Mora, Arturo MaganaAdvisors
Bajic, Vladimir B.
Committee members
Moshkov, Mikhail
Zhang, Xiangliang

Program
Computer ScienceDate
2011-06Permanent link to this record
http://hdl.handle.net/10754/136690
Metadata
Show full item recordAbstract
This work uses genetic algorithms (GA) to reduce the complexity of the artificial neural networks (ANNs) and decision trees (DTs) for the accurate recognition of translation initiation sites (TISs) in Arabidopsis Thaliana. The Arabidopsis data was extracted directly from genomic DNA sequences. Methods derived in this work resulted in both reduced complexity of the predictors, as well as in improvement in prediction accuracy (generalization). Optimization through use of GA is generally a computationally intensive task. One of the approaches to overcome this problem is to use parallelization of code that implements GA, thus allowing computation on multiprocessing infrastructure. However, further improvement in performance GA implementation could be achieved through modification done to GA basic operations such as selection, crossover and mutation. In this work we explored two such improvements, namely evolutive mutation and GA-Simplex crossover operation. In this thesis we studied the benefit of these modifications on the problem of TISs recognition. Compared to the non-modified GA approach, we reduced the number of weights in the resulting model's neural network component by 51% and the number of nodes in the model's DTs component by 97% whilst improving the model's accuracy at the same time. Separately, we developed another methodology for reducing the complexity of prediction models by optimizing the composition of training data subsets in bootstrap aggregation (bagging) methodology. This optimization is achieved by applying a new GA-based bagging methodology in order to optimize the composition of each of the training data subsets. This approach has shown in our test cases to considerably enhance the accuracy of the TIS prediction model compared to the original bagging methodology. Although these methods are applied to the problem of accurate prediction of TISs we believe that these methodologies have a potential for wider scope of application.Citation
Mora, A. M. (2011). Genetic Algorithms for Models Optimization for Recognition of Translation Initiation Sites. KAUST Research Repository. https://doi.org/10.25781/KAUST-6S92Iae974a485f413a2113503eed53cd6c53
10.25781/KAUST-6S92I