LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

Handle URI:
http://hdl.handle.net/10754/344396
Title:
LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
Authors:
Chen, Peng; Huang, Jianhua Z; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Publisher:
BioMed Central
Journal:
BMC Bioinformatics (part of the supplement: Proceedings of the 2013 International Conference on Intelligent Computing (ICIC 2013))
Conference/Event name:
2013 International Conference on Intelligent Computing (ICIC 2013)
Issue Date:
3-Dec-2014
DOI:
10.1186/1471-2105-15-S15-S4
Type:
Conference Paper
Sponsors:
This work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Natural Science Foundation of China (Nos. 61300058, 61374181 and 61472282). Publication charges for this article have been funded by the Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).
Additional Links:
http://www.biomedcentral.com/1471-2105/15/S15/S4
Appears in Collections:
Conference Papers; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorChen, Pengen
dc.contributor.authorHuang, Jianhua Zen
dc.contributor.authorGao, Xinen
dc.date.accessioned2015-02-11T12:15:54Z-
dc.date.available2015-02-11T12:15:54Z-
dc.date.issued2014-12-03en
dc.identifier.doi10.1186/1471-2105-15-S15-S4en
dc.identifier.urihttp://hdl.handle.net/10754/344396en
dc.description.abstractBackground Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.en
dc.description.sponsorshipThis work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Natural Science Foundation of China (Nos. 61300058, 61374181 and 61472282). Publication charges for this article have been funded by the Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).en
dc.publisherBioMed Centralen
dc.relation.urlhttp://www.biomedcentral.com/1471-2105/15/S15/S4en
dc.rights© 2014 Chen et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.en
dc.titleLigandRFs: random forest ensemble to identify ligand-binding residues from sequence information aloneen
dc.typeConference Paperen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBMC Bioinformatics (part of the supplement: Proceedings of the 2013 International Conference on Intelligent Computing (ICIC 2013))en
dc.conference.name2013 International Conference on Intelligent Computing (ICIC 2013)en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionInstitute of Health Sciences, Anhui University, Hefei, Anhui 230601, Chinaen
dc.contributor.institutionDepartment of Statistics, Texas A&M University, College Station, TX 77843-3143, USAen
kaust.authorChen, Pengen
kaust.authorGao, Xinen
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.