Show simple item record

dc.contributor.authorBarradas Bautista, Didier
dc.contributor.authorCao, Zhen
dc.contributor.authorVangone, Anna
dc.contributor.authorOliva, Romina
dc.contributor.authorCavallo, Luigi
dc.date.accessioned2021-12-13T09:19:21Z
dc.date.available2021-06-28T07:26:33Z
dc.date.available2021-12-13T09:19:21Z
dc.date.issued2021-12-10
dc.identifier.citationBarradas-Bautista, D., Cao, Z., Vangone, A., Oliva, R., & Cavallo, L. (2021). A Random Forest Classifier for Protein-Protein Docking Models. Bioinformatics Advances. doi:10.1093/bioadv/vbab042
dc.identifier.issn2635-0041
dc.identifier.doi10.1093/bioadv/vbab042
dc.identifier.doi10.1101/2021.06.23.449420
dc.identifier.urihttp://hdl.handle.net/10754/669803
dc.description.abstractHerein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated 3 × 104 docking models for each of the 230 complexes in the protein-protein benchmark, version 5 (BM5), using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of ≈ 7 × 106 docking models. Three different machine-learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named CoDES (COnservation Driven Expert System). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine-learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions.
dc.description.sponsorshipThe IRaPPA dataset was a courtesy of the methods authors Iain H. Moal and Juan Fernandez-Recio. LC thanks the Supercomputing Laboratory at the King Abdullah University of Science and Technology (KAUST) for technical support and access to the Shaheen facilities. DBB was supported by funding from the AI Initiative at KAUST.
dc.publisherOxford University Press (OUP)
dc.relation.urlhttps://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab042/6459166
dc.rightsThis is a pre-copyedited, author-produced PDF of an article accepted for publication in Bioinformatics Advances following peer review. The version of record is available online at: https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab042/6459166.
dc.rights© The Author(s) 2021. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleA Random Forest Classifier for Protein-Protein Docking Models
dc.typeArticle
dc.contributor.departmentKAUST Catalysis Center (KCC)
dc.contributor.departmentChemical Science Program
dc.contributor.departmentPhysical Science and Engineering (PSE) Division
dc.identifier.journalBioinformatics Advances
dc.eprint.versionPost-print
dc.contributor.institutionPharma Research and Early Development, Therapeutic Modalities, Roche Innovation Center Munich,Large Molecule Research, Nonnenwald 2, 82377, Penzberg, Germany
dc.contributor.institutionDepartment of Sciences and Technologies, University Parthenope of Naples, Centro Direzionale Isola C4, I-80143, Naples, Italy
kaust.personBarradas Bautista, Didier
kaust.personCao, Zhen
kaust.personCavallo, Luigi
dc.relation.issupplementedbyDOI:10.5281/zenodo.4012018
refterms.dateFOA2021-12-13T09:19:21Z
display.relations<b>Is Supplemented By:</b><br/> <ul><li><i>[Dataset]</i> <br/> Barradas-Bautista, D., Oliva, R., &amp; Cavallo, L. (2020). <i>A protein-protein docking decoys set from three different rigid body methods</i> (one) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4012018. DOI: <a href="https://doi.org/10.5281/zenodo.4012018" >10.5281/zenodo.4012018</a> Handle: <a href="http://hdl.handle.net/10754/674130" >10754/674130</a></a></li></ul>
kaust.acknowledged.supportUnitShaheen
kaust.acknowledged.supportUnitSupercomputing Laboratory


Files in this item

Thumbnail
Name:
vbab042.pdf
Size:
3.252Mb
Format:
PDF
Description:
Accepted manuscript
Thumbnail
Name:
vbab042_supplementary_data.pdf
Size:
199.4Kb
Format:
PDF
Description:
Supplementary data

This item appears in the following Collection(s)

Show simple item record

This is a pre-copyedited, author-produced PDF of an article accepted for publication in Bioinformatics Advances following peer review. The version of record is available online at: https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab042/6459166.
Except where otherwise noted, this item's license is described as This is a pre-copyedited, author-produced PDF of an article accepted for publication in Bioinformatics Advances following peer review. The version of record is available online at: https://academic.oup.com/bioinformaticsadvances/advance-article/doi/10.1093/bioadv/vbab042/6459166.
VersionItemEditorDateSummary

*Selected version