LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Online Publication Date2014-12-03
Print Publication Date2014
Permanent link to this recordhttp://hdl.handle.net/10754/344396
MetadataShow full item record
AbstractBackground Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
SponsorsThis work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Natural Science Foundation of China (Nos. 61300058, 61374181 and 61472282). Publication charges for this article have been funded by the Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).
Conference/Event name2013 International Conference on Intelligent Computing (ICIC 2013)
The following license files are associated with this item:
- A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction.
- Authors: Chen P, Hu S, Zhang J, Gao X, Li J, Xia J, Wang B
- Issue date: 2016 Sep-Oct
- Prediction of ligand binding sites using homologous structures and conservation at CASP8.
- Authors: Wass MN, Sternberg MJ
- Issue date: 2009
- Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.
- Authors: Walia RR, Caragea C, Lewis BA, Towfic F, Terribilini M, El-Manzalawy Y, Dobbs D, Honavar V
- Issue date: 2012 May 10
- Predicting small ligand binding sites in proteins using backbone structure.
- Authors: Bordner AJ
- Issue date: 2008 Dec 15
- Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.
- Authors: Shamim MT, Anwaruddin M, Nagarajaram HA
- Issue date: 2007 Dec 15