KAUST DepartmentKAUST Catalysis Center (KCC)
Physical Science and Engineering (PSE) Division
Chemical Science Program
Permanent link to this recordhttp://hdl.handle.net/10754/669803
MetadataShow full item record
AbstractHerein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated a set of ~7xE06 docking models with three different docking programs (HADDOCK, FTDock and ZDOCK) for the 230 complexes in the protein-protein interaction benchmark, version 5 (BM5). Three different machine-learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named CoDES (COnservation Driven Expert System). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine-learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions.
CitationBarradas-Bautista, D., Cao, Z., Vangone, A., Oliva, R., & Cavallo, L. (2021). A Random Forest Classifier for Protein-Protein Docking Models. doi:10.1101/2021.06.23.449420
PublisherCold Spring Harbor Laboratory