LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
Type
Conference PaperKAUST Department
Computational Bioscience Research Center (CBRC)Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
KAUST Grant Number
GRP-CF-2011-19-P-Gao-HuangKUS-CI-016-04
Date
2014-12-03Online Publication Date
2014-12-03Print Publication Date
2014Permanent link to this record
http://hdl.handle.net/10754/344396
Metadata
Show full item recordAbstract
Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.Citation
Chen, P., Huang, J. Z., & Gao, X. (2014). LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinformatics, 15(Suppl 15), S4. doi:10.1186/1471-2105-15-s15-s4Sponsors
This work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Natural Science Foundation of China (Nos. 61300058, 61374181 and 61472282). Publication charges for this article have been funded by the Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).Publisher
Springer NatureJournal
BMC BioinformaticsConference/Event name
2013 International Conference on Intelligent Computing (ICIC 2013)PubMed ID
25474163Additional Links
http://www.biomedcentral.com/1471-2105/15/S15/S4ae974a485f413a2113503eed53cd6c53
10.1186/1471-2105-15-S15-S4
Scopus Count
The following license files are associated with this item:
Related articles
- A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction.
- Authors: Chen P, Hu S, Zhang J, Gao X, Li J, Xia J, Wang B
- Issue date: 2016 Sep-Oct
- Predicting small ligand binding sites in proteins using backbone structure.
- Authors: Bordner AJ
- Issue date: 2008 Dec 15
- Prediction of ligand binding sites using homologous structures and conservation at CASP8.
- Authors: Wass MN, Sternberg MJ
- Issue date: 2009
- Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs.
- Authors: Shamim MT, Anwaruddin M, Nagarajaram HA
- Issue date: 2007 Dec 15
- Protein-binding site prediction based on three-dimensional protein modeling.
- Authors: Oh M, Joo K, Lee J
- Issue date: 2009