Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA
Type
ArticleAuthors
Albaradei, Somayah
Magana-Mora, Arturo

Thafar, Maha A.

Uludag, Mahmut

Bajic, Vladimir B.

Gojobori, Takashi

Essack, Magbubah

Jankovic, Boris R.
KAUST Department
Computer Science ProgramComputational Bioscience Research Center (CBRC)
Applied Mathematics and Computational Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Bioscience Program
Biological and Environmental Sciences and Engineering (BESE) Division
KAUST Grant Number
BAS/1/1606-01-01FCC/1/1976-17-01
Date
2020-05-13Online Publication Date
2020-05-13Print Publication Date
2020-12Submitted Date
2020-03-30Permanent link to this record
http://hdl.handle.net/10754/662957
Metadata
Show full item recordAbstract
Background: The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases. Although various models have been proposed for the in silico prediction of SS, improving their accuracy is required for reliable annotation. Moreover, models are often derived and tested using the same genome, providing no evidence of broad application, i.e. to other poorly studied genomes. Results: With this in mind, we developed the Splice2Deep models for SS detection. Each model is an ensemble of deep convolutional neural networks. We evaluated the performance of the models based on the ability to detect SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate that the models efficiently detect SS in other organisms not considered during the training of the models. Compared to the state-of-the-art tools, Splice2Deep models achieved significantly reduced average error rates of 41.97% and 28.51% for acceptor and donor SS, respectively. Moreover, the Splice2Deep cross-organism validation demonstrates that models correctly identify conserved genomic elements enabling annotation of SS in new genomes by choosing the taxonomically closest model. Conclusions: The results of our study demonstrated that Splice2Deep both achieved a considerably reduced error rate compared to other state-of-the-art models and the ability to accurately recognize SS in other organisms for which the model was not trained, enabling annotation of poorly studied or newly sequenced genomes. Splice2Deep models are implemented in Python using Keras API; the models and the data are available at https://github.com/SomayahAlbaradei/Splice_Deep.git.Citation
Albaradei, S., Magana-Mora, A., Thafar, M., Uludag, M., Bajic, V. B., Gojobori, T., … Jankovic, B. R. (2020). Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene: X, 5, 100035. doi:10.1016/j.gene.2020.100035Sponsors
VBB has been supported by the King Abdullah University of Science and Technology (KAUST) Base Research Fund (BAS/1/1606-01-01); ME has been supported by KAUST Office of Sponsored Research (OSR) Award no. FCC/1/1976-17-01. TG has also been supported by the King Abdullah University of Science and Technology (KAUST) Base Research Fund (BAS/1/1059-01-01).Publisher
Elsevier BVJournal
GeneAdditional Links
https://linkinghub.elsevier.com/retrieve/pii/S2590158320300097Relations
Is Supplemented By:- [Software]
Title: SomayahAlbaradei/Splice_Deep:. Publication Date: 2019-07-31. github: SomayahAlbaradei/Splice_Deep Handle: 10754/668009
ae974a485f413a2113503eed53cd6c53
10.1016/j.gene.2020.100035
Scopus Count
Collections
Articles; Biological and Environmental Science and Engineering (BESE) Division; Bioscience Program; Applied Mathematics and Computational Science Program; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Except where otherwise noted, this item's license is described as This is an open access article under the CC BY-NC-ND license.