Genetic Algorithms for Models Optimization for Recognition of Translation Initiation Sites
AuthorsMora, Arturo Magana
AdvisorsBajic, Vladimir B.
Permanent link to this recordhttp://hdl.handle.net/10754/136690
MetadataShow full item record
AbstractThis work uses genetic algorithms (GA) to reduce the complexity of the artificial neural networks (ANNs) and decision trees (DTs) for the accurate recognition of translation initiation sites (TISs) in Arabidopsis Thaliana. The Arabidopsis data was extracted directly from genomic DNA sequences. Methods derived in this work resulted in both reduced complexity of the predictors, as well as in improvement in prediction accuracy (generalization). Optimization through use of GA is generally a computationally intensive task. One of the approaches to overcome this problem is to use parallelization of code that implements GA, thus allowing computation on multiprocessing infrastructure. However, further improvement in performance GA implementation could be achieved through modification done to GA basic operations such as selection, crossover and mutation. In this work we explored two such improvements, namely evolutive mutation and GA-Simplex crossover operation. In this thesis we studied the benefit of these modifications on the problem of TISs recognition. Compared to the non-modified GA approach, we reduced the number of weights in the resulting model's neural network component by 51% and the number of nodes in the model's DTs component by 97% whilst improving the model's accuracy at the same time. Separately, we developed another methodology for reducing the complexity of prediction models by optimizing the composition of training data subsets in bootstrap aggregation (bagging) methodology. This optimization is achieved by applying a new GA-based bagging methodology in order to optimize the composition of each of the training data subsets. This approach has shown in our test cases to considerably enhance the accuracy of the TIS prediction model compared to the original bagging methodology. Although these methods are applied to the problem of accurate prediction of TISs we believe that these methodologies have a potential for wider scope of application.