RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins
KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST) , Thuwal 23955-6900 , Kingdom of Saudi Arabia
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Embargo End Date2023-06-02
Permanent link to this recordhttp://hdl.handle.net/10754/678590
MetadataShow full item record
AbstractRNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.
CitationPeng, X., Wang, X., Guo, Y., Ge, Z., Li, F., Gao, X., & Song, J. (2022). RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Briefings in Bioinformatics. https://doi.org/10.1093/bib/bbac215
SponsorsNational Health and Medical Research Council of Australia (NHMRC) (grant nos. APP1127948, APP1144652); Australian Research Council (ARC) (grant nos. LP110200333, DP120104460); National Institute of Allergy and Infectious Diseases of the National Institutes of Health (grant no. R01 AI111965); Major Inter-Disciplinary Research (IDR) project awarded by Monash University.
PublisherOxford University Press (OUP)
JournalBriefings in bioinformatics
RelationsIs Supplemented By:
- PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning.
- Authors: Zhang J, Yan K, Chen Q, Liu B
- Issue date: 2022 Apr 12
- RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.
- Authors: Pan X, Shen HB
- Issue date: 2017 Feb 28
- RNA-binding protein recognition based on multi-view deep feature and multi-label learning.
- Authors: Yang H, Deng Z, Pan X, Shen HB, Choi KS, Wang L, Wang S, Wu J
- Issue date: 2021 May 20
- Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction.
- Authors: Su Y, Luo Y, Zhao X, Liu Y, Peng J
- Issue date: 2019 Sep
- A deep learning framework for modeling structural features of RNA-binding protein targets.
- Authors: Zhang S, Zhou J, Hu H, Gong H, Chen L, Cheng C, Zeng J
- Issue date: 2016 Feb 29