Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

Handle URI:
http://hdl.handle.net/10754/625354
Title:
Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
Authors:
Magana-Mora, Arturo ( 0000-0001-8696-7068 ) ; Kalkatawi, Manal Matoq Saeed; Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Abstract:
BackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.
KAUST Department:
Computational Bioscience Research Center (CBRC)
Citation:
Magana-Mora A, Kalkatawi M, Bajic VB (2017) Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA. BMC Genomics 18. Available: http://dx.doi.org/10.1186/s12864-017-4033-7.
Publisher:
Springer Nature
Journal:
BMC Genomics
Issue Date:
15-Aug-2017
DOI:
10.1186/s12864-017-4033-7
Type:
Article
ISSN:
1471-2164
Sponsors:
This research made use of the resources of CBRC and IT Research Computing at King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia. This work was supported by King Abdullah University of Science and Technology (KAUST) through the baseline fund BAS/1/1606–01-01 of VBB.
Additional Links:
http://link.springer.com/article/10.1186/s12864-017-4033-7
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC)

Full metadata record

DC FieldValue Language
dc.contributor.authorMagana-Mora, Arturoen
dc.contributor.authorKalkatawi, Manal Matoq Saeeden
dc.contributor.authorBajic, Vladimir B.en
dc.date.accessioned2017-08-17T06:37:53Z-
dc.date.available2017-08-17T06:37:53Z-
dc.date.issued2017-08-15en
dc.identifier.citationMagana-Mora A, Kalkatawi M, Bajic VB (2017) Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA. BMC Genomics 18. Available: http://dx.doi.org/10.1186/s12864-017-4033-7.en
dc.identifier.issn1471-2164en
dc.identifier.doi10.1186/s12864-017-4033-7en
dc.identifier.urihttp://hdl.handle.net/10754/625354-
dc.description.abstractBackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.en
dc.description.sponsorshipThis research made use of the resources of CBRC and IT Research Computing at King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia. This work was supported by King Abdullah University of Science and Technology (KAUST) through the baseline fund BAS/1/1606–01-01 of VBB.en
dc.publisherSpringer Natureen
dc.relation.urlhttp://link.springer.com/article/10.1186/s12864-017-4033-7en
dc.rightsThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectPolyadenylationen
dc.subjectPredictionen
dc.subjectGenomic DNAen
dc.subjectMachine learningen
dc.subjectBioinformaticsen
dc.titleOmni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNAen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalBMC Genomicsen
dc.eprint.versionPublisher's Version/PDFen
kaust.authorMagana-Mora, Arturoen
kaust.authorKalkatawi, Manal Matoq Saeeden
kaust.authorBajic, Vladimir B.en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.