Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis.
Type
ArticleKAUST Department
Computational Bioscience Research Center (CBRC)Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Date
2021-06-01Online Publication Date
2021-06-01Print Publication Date
2021-07Embargo End Date
2022-06-13Submitted Date
2020-12-29Permanent link to this record
http://hdl.handle.net/10754/669594
Metadata
Show full item recordAbstract
Predicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and time-consuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISP-XGBoost method can further enhance the prediction of PPI sites.Citation
Wang, X., Zhang, Y., Yu, B., Salhi, A., Chen, R., Wang, L., & Liu, Z. (2021). Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Computers in Biology and Medicine, 134, 104516. doi:10.1016/j.compbiomed.2021.104516Sponsors
This work was supported by the National Natural Science Foundation of China (No. 61863010), the Key Research and Development Program of Shandong Province of China (No. 2019GGX101001), and the Key Laboratory Open Foundation of Hainan Province (No. JSKX202001).Publisher
Elsevier BVPubMed ID
34119922Additional Links
https://linkinghub.elsevier.com/retrieve/pii/S0010482521003103ae974a485f413a2113503eed53cd6c53
10.1016/j.compbiomed.2021.104516
Scopus Count
Related articles
- DeepStack-DTIs: Predicting Drug-Target Interactions Using LightGBM Feature Selection and Deep-Stacked Ensemble Classifier.
- Authors: Zhang Y, Jiang Z, Chen C, Wei Q, Gu H, Yu B
- Issue date: 2022 Jun
- SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.
- Authors: Yu B, Qiu W, Chen C, Ma A, Jiang J, Zhou H, Ma Q
- Issue date: 2020 Feb 15
- Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier.
- Authors: Liu Y, Jin S, Song L, Han Y, Yu B
- Issue date: 2021 Sep
- GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting.
- Authors: Yu B, Chen C, Zhou H, Liu B, Ma Q
- Issue date: 2020 Oct
- SXGBsite: Prediction of Protein-Ligand Binding Sites Using Sequence Information and Extreme Gradient Boosting.
- Authors: Zhao Z, Xu Y, Zhao Y
- Issue date: 2019 Nov 22