Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences

Handle URI:
http://hdl.handle.net/10754/562868
Title:
Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences
Authors:
Chen, Peng; Li, Jinyan; Limsoon, Wong; Kuwahara, Hiroyuki; Huang, Jianhua Z.; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computational Bioscience Research Center (CBRC); Computer Science Program; Structural and Functional Bioinformatics Group
Publisher:
Wiley-Blackwell
Journal:
Proteins: Structure, Function, and Bioinformatics
Issue Date:
23-Jul-2013
DOI:
10.1002/prot.24278
PubMed ID:
23504705
Type:
Article
ISSN:
08873585
Sponsors:
Grant sponsor: King Abdullah University of Science and Technology (KAUST); Grand numbers: KUS-CI-016-04; GRP-CF-2011-19-P-Gao-Huang.
Appears in Collections:
Articles; Structural and Functional Bioinformatics Group; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorChen, Pengen
dc.contributor.authorLi, Jinyanen
dc.contributor.authorLimsoon, Wongen
dc.contributor.authorKuwahara, Hiroyukien
dc.contributor.authorHuang, Jianhua Z.en
dc.contributor.authorGao, Xinen
dc.date.accessioned2015-08-03T11:13:26Zen
dc.date.available2015-08-03T11:13:26Zen
dc.date.issued2013-07-23en
dc.identifier.issn08873585en
dc.identifier.pmid23504705en
dc.identifier.doi10.1002/prot.24278en
dc.identifier.urihttp://hdl.handle.net/10754/562868en
dc.description.abstractHot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013 Wiley Periodicals, Inc.en
dc.description.sponsorshipGrant sponsor: King Abdullah University of Science and Technology (KAUST); Grand numbers: KUS-CI-016-04; GRP-CF-2011-19-P-Gao-Huang.en
dc.publisherWiley-Blackwellen
dc.subjectClassificationen
dc.subjectFeature selectionen
dc.subjectHot spot residueen
dc.subjectPhysicochemical characteristicen
dc.subjectProtein-protein interactionen
dc.titleAccurate prediction of hot spot residues through physicochemical characteristics of amino acid sequencesen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.identifier.journalProteins: Structure, Function, and Bioinformaticsen
dc.contributor.institutionAdvanced Analytics Institute, University of Technology, Sydney, NSW, Australiaen
dc.contributor.institutionSchool of Computing, National University of Singapore, 117417, Singaporeen
dc.contributor.institutionDepartment of Statistics, Texas A and M University, College Station, TX, United Statesen
kaust.authorChen, Pengen
kaust.authorKuwahara, Hiroyukien
kaust.authorGao, Xinen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.