RPI-MDLStack: Predicting RNA–protein interactions through deep learning with stacking strategy and LASSO

RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.

Yu, B., Wang, X., Zhang, Y., Gao, H., Wang, Y., Liu, Y., & Gao, X. (2022). RPI-MDLStack: Predicting RNA–protein interactions through deep learning with stacking strategy and LASSO. Applied Soft Computing, 120, 108676. https://doi.org/10.1016/j.asoc.2022.108676

We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Spon-sored Research (OSR) under award numbers (Nos. FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4379-01-01, REI/1/4742-01-01 and URF/1/4098-01-01)

Elsevier BV

Applied Soft Computing


Additional Links

Permanent link to this record