Show simple item record

dc.contributor.authorWang, Xue
dc.contributor.authorZhang, Yaqun
dc.contributor.authorYu, Bin
dc.contributor.authorSalhi, Adil
dc.contributor.authorChen, Ruixin
dc.contributor.authorWang, Lin
dc.contributor.authorLiu, Zengfeng
dc.date.accessioned2021-06-16T06:10:17Z
dc.date.available2021-06-16T06:10:17Z
dc.date.issued2021-06-01
dc.date.submitted2020-12-29
dc.identifier.citationWang, X., Zhang, Y., Yu, B., Salhi, A., Chen, R., Wang, L., & Liu, Z. (2021). Prediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis. Computers in Biology and Medicine, 134, 104516. doi:10.1016/j.compbiomed.2021.104516
dc.identifier.issn0010-4825
dc.identifier.pmid34119922
dc.identifier.doi10.1016/j.compbiomed.2021.104516
dc.identifier.urihttp://hdl.handle.net/10754/669594
dc.description.abstractPredicting protein-protein interaction sites (PPI sites) can provide important clues for understanding biological activity. Using machine learning to predict PPI sites can mitigate the cost of running expensive and time-consuming biological experiments. Here we propose PPISP-XGBoost, a novel PPI sites prediction method based on eXtreme gradient boosting (XGBoost). First, the characteristic information of protein is extracted through the pseudo-position specific scoring matrix (PsePSSM), pseudo-amino acid composition (PseAAC), hydropathy index and solvent accessible surface area (ASA) under the sliding window. Next, these raw features are preprocessed to obtain more optimal representations in order to achieve better prediction. In particular, the synthetic minority oversampling technique (SMOTE) is used to circumvent class imbalance, and the kernel principal component analysis (KPCA) is applied to remove redundant characteristics. Finally, these optimal features are fed to the XGBoost classifier to identify PPI sites. Using PPISP-XGBoost, the prediction accuracy on the training dataset Dset186 reaches 85.4%, and the accuracy on the independent validation datasets Dtestset72, PDBtestset164, Dset_448 and Dset_355 reaches 85.3%, 83.9%, 85.8% and 85.4%, respectively, which all show an increase in accuracy against existing PPI sites prediction methods. These results demonstrate that the PPISP-XGBoost method can further enhance the prediction of PPI sites.
dc.description.sponsorshipThis work was supported by the National Natural Science Foundation of China (No. 61863010), the Key Research and Development Program of Shandong Province of China (No. 2019GGX101001), and the Key Laboratory Open Foundation of Hainan Province (No. JSKX202001).
dc.publisherElsevier BV
dc.relation.urlhttps://linkinghub.elsevier.com/retrieve/pii/S0010482521003103
dc.rightsNOTICE: this is the author’s version of a work that was accepted for publication in Computers in biology and medicine. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computers in biology and medicine, [134, , (2021-06-13)] DOI: 10.1016/j.compbiomed.2021.104516 . © 2021. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titlePrediction of protein-protein interaction sites through eXtreme gradient boosting with kernel principal component analysis.
dc.typeArticle
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.identifier.journalComputers in biology and medicine
dc.rights.embargodate2022-06-13
dc.eprint.versionPost-print
dc.contributor.institutionCollege of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China. b
dc.contributor.institutionArtificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China.
dc.contributor.institutionKey Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
dc.identifier.volume134
dc.identifier.pages104516
kaust.personSalhi, Adil
dc.date.accepted2021-05-24
dc.date.published-online2021-06-01
dc.date.published-print2021-07


This item appears in the following Collection(s)

Show simple item record