Show simple item record

dc.contributor.authorZheng, Yumin
dc.contributor.authorWang, Haohan
dc.contributor.authorZhang, Yang
dc.contributor.authorGao, Xin
dc.contributor.authorXing, Eric P.
dc.contributor.authorXu, Min
dc.date.accessioned2020-11-09T11:34:17Z
dc.date.available2020-11-09T11:34:17Z
dc.date.issued2020-11-05
dc.date.submitted2020-03-25
dc.identifier.citationZheng, Y., Wang, H., Zhang, Y., Gao, X., Xing, E. P., & Xu, M. (2020). Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species. PLOS Computational Biology, 16(11), e1008297. doi:10.1371/journal.pcbi.1008297
dc.identifier.issn1553-7358
dc.identifier.pmid33151940
dc.identifier.doi10.1371/journal.pcbi.1008297
dc.identifier.urihttp://hdl.handle.net/10754/665869
dc.description.abstractIn eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use three species and build cross-species training sets with two of them and evaluate the performance of the remaining one. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
dc.description.sponsorshipThis publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/2602-01 and URF/1/3007-01. This work was supported in part by U.S. National Institutes of Health (NIH) grants P41-GM103712, R01-GM134020, R01-GM093156, and P30-DA035778. This work was supported in part by U.S. National Science Foundation (NSF) grant DBI-1949629 and IIS-2007595. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
dc.publisherPublic Library of Science (PLoS)
dc.relation.urlhttps://dx.plos.org/10.1371/journal.pcbi.1008297
dc.rightsThis is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titlePoly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species
dc.typeArticle
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentStructural and Functional Bioinformatics Group
dc.identifier.journalPLOS Computational Biology
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionSchool of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom.
dc.contributor.institutionLanguage Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
dc.contributor.institutionComputational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
dc.contributor.institutionMachine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
dc.identifier.volume16
dc.identifier.issue11
dc.identifier.pagese1008297
kaust.personGao, Xin
kaust.grant.numberURF/1/2602-01
kaust.grant.numberURF/1/3007-01
dc.date.accepted2020-08-30
refterms.dateFOA2020-11-09T11:44:25Z
kaust.acknowledged.supportUnitOffice of Sponsored Research (OSR)


Files in this item

Thumbnail
Name:
journal.pcbi.1008297.pdf
Size:
3.606Mb
Format:
PDF
Description:
Published version

This item appears in the following Collection(s)

Show simple item record

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Except where otherwise noted, this item's license is described as This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.