miProBERT: identification of microRNA promoters based on the pre-trained model BERT.
Type
ArticleAuthors
Wang, Xin
Gao, Xin

Wang, Guohua

Li, Dan
KAUST Department
Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.Computer Science Program
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Date
2023-03-17Embargo End Date
2024-03-17Permanent link to this record
http://hdl.handle.net/10754/690409
Metadata
Show full item recordAbstract
Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.Citation
Wang, X., Gao, X., Wang, G., & Li, D. (2023). miProBERT: identification of microRNA promoters based on the pre-trained model BERT. Briefings in Bioinformatics. https://doi.org/10.1093/bib/bbad093Sponsors
National Natural Science Foundation of China (62225109, 62072095), Fundamental Research Funds for the Central Universities (No. HIT.BRET.2022003) and National Key R&D Program of China (2021YFC2100101).Publisher
Oxford University Press (OUP)Journal
Briefings in bioinformaticsPubMed ID
36929862Relations
Is Supplemented By:- [Software]
Title: xwang1427/miProBERT:. Publication Date: 2022-11-21. github: xwang1427/miProBERT Handle: 10754/691998
ae974a485f413a2113503eed53cd6c53
10.1093/bib/bbad093
Scopus Count
Related articles
- MicroRNA transcription start site prediction with multi-objective feature selection.
- Authors: Bhattacharyya M, Feuerbach L, Bhadra T, Lengauer T, Bandyopadhyay S
- Issue date: 2012 Jan 6
- RNA polymerase II binding patterns reveal genomic regions involved in microRNA gene regulation.
- Authors: Wang G, Wang Y, Shen C, Huang YW, Huang K, Huang TH, Nephew KP, Li L, Liu Y
- Issue date: 2010 Nov 2
- BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection.
- Authors: Le NQK, Ho QT, Nguyen VN, Chang JS
- Issue date: 2022 Aug
- Annotation of mammalian primary microRNAs.
- Authors: Saini HK, Enright AJ, Griffiths-Jones S
- Issue date: 2008 Nov 27
- MicroRNA Promoter Identification in Human with a Three-level Prediction Method.
- Authors: Wang X, Li J, Wang G
- Issue date: 2023 Aug 17