miProBERT: identification of microRNA promoters based on the pre-trained model BERT.
KAUST DepartmentComputational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
Computer Science Program
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Embargo End Date2024-03-17
Permanent link to this recordhttp://hdl.handle.net/10754/690409
MetadataShow full item record
AbstractAccurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.
CitationWang, X., Gao, X., Wang, G., & Li, D. (2023). miProBERT: identification of microRNA promoters based on the pre-trained model BERT. Briefings in Bioinformatics. https://doi.org/10.1093/bib/bbad093
SponsorsNational Natural Science Foundation of China (62225109, 62072095), Fundamental Research Funds for the Central Universities (No. HIT.BRET.2022003) and National Key R&D Program of China (2021YFC2100101).
PublisherOxford University Press (OUP)
JournalBriefings in bioinformatics
RelationsIs Supplemented By:
- MicroRNA transcription start site prediction with multi-objective feature selection.
- Authors: Bhattacharyya M, Feuerbach L, Bhadra T, Lengauer T, Bandyopadhyay S
- Issue date: 2012 Jan 6
- RNA polymerase II binding patterns reveal genomic regions involved in microRNA gene regulation.
- Authors: Wang G, Wang Y, Shen C, Huang YW, Huang K, Huang TH, Nephew KP, Li L, Liu Y
- Issue date: 2010 Nov 2
- BERT-Promoter: An improved sequence-based predictor of DNA promoter using BERT pre-trained model and SHAP feature selection.
- Authors: Le NQK, Ho QT, Nguyen VN, Chang JS
- Issue date: 2022 Aug
- Annotation of mammalian primary microRNAs.
- Authors: Saini HK, Enright AJ, Griffiths-Jones S
- Issue date: 2008 Nov 27
- MicroRNA Promoter Identification in Human with a Three-level Prediction Method.
- Authors: Wang X, Li J, Wang G
- Issue date: 2023 Aug 17