ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition

Handle URI:
http://hdl.handle.net/10754/562173
Title:
ProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid composition
Authors:
Wang, Jim Jing-Yan; Li, Yongping; Wang, Quanquan; You, Xinge; Man, Jiaju; Wang, Chao; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Knowing the type of an uncharacterized membrane protein often provides a useful clue in both basic research and drug discovery. With the explosion of protein sequences generated in the post genomic era, determination of membrane protein types by experimental methods is expensive and time consuming. It therefore becomes important to develop an automated method to find the possible types of membrane proteins. In view of this, various computational membrane protein prediction methods have been proposed. They extract protein feature vectors, such as PseAAC (pseudo amino acid composition) and PsePSSM (pseudo position-specific scoring matrix) for representation of protein sequence, and then learn a distance metric for the KNN (K nearest neighbor) or NN (nearest neighbor) classifier to predicate the final type. Most of the metrics are learned using linear dimensionality reduction algorithms like Principle Components Analysis (PCA) and Linear Discriminant Analysis (LDA). Such metrics are common to all the proteins in the dataset. In fact, they assume that the proteins lie on a uniform distribution, which can be captured by the linear dimensionality reduction algorithm. We doubt this assumption, and learn local metrics which are optimized for local subset of the whole proteins. The learning procedure is iterated with the protein clustering. Then a novel ensemble distance metric is given by combining the local metrics through Tikhonov regularization. The experimental results on a benchmark dataset demonstrate the feasibility and effectiveness of the proposed algorithm named ProClusEnsem. © 2012 Elsevier Ltd.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Computational Bioscience Research Center (CBRC); Structural and Functional Bioinformatics Group
Publisher:
Elsevier BV
Journal:
Computers in Biology and Medicine
Issue Date:
May-2012
DOI:
10.1016/j.compbiomed.2012.01.012
PubMed ID:
22386149
Type:
Article
ISSN:
00104825
Sponsors:
The study was supported by grants from Shanghai Key Laboratory of Intelligent Information Processing, China (Grant No. IIPL-2011-003), Key Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China (Grant No. HS201107), National Grand Fundamental Research (973) Program of China (Grant Nos. 2010CB834303 and 2011CB911102), National Natural Science Foundation of China (Grant No. 60973154), Hubei Provincial Science Foundation, China (Grant Nos. 2010CDA006 and 2010CD06601), and a grant from King Abdullah University of Science and Technology.
Appears in Collections:
Articles; Structural and Functional Bioinformatics Group; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWang, Jim Jing-Yanen
dc.contributor.authorLi, Yongpingen
dc.contributor.authorWang, Quanquanen
dc.contributor.authorYou, Xingeen
dc.contributor.authorMan, Jiajuen
dc.contributor.authorWang, Chaoen
dc.contributor.authorGao, Xinen
dc.date.accessioned2015-08-03T09:46:29Zen
dc.date.available2015-08-03T09:46:29Zen
dc.date.issued2012-05en
dc.identifier.issn00104825en
dc.identifier.pmid22386149en
dc.identifier.doi10.1016/j.compbiomed.2012.01.012en
dc.identifier.urihttp://hdl.handle.net/10754/562173en
dc.description.abstractKnowing the type of an uncharacterized membrane protein often provides a useful clue in both basic research and drug discovery. With the explosion of protein sequences generated in the post genomic era, determination of membrane protein types by experimental methods is expensive and time consuming. It therefore becomes important to develop an automated method to find the possible types of membrane proteins. In view of this, various computational membrane protein prediction methods have been proposed. They extract protein feature vectors, such as PseAAC (pseudo amino acid composition) and PsePSSM (pseudo position-specific scoring matrix) for representation of protein sequence, and then learn a distance metric for the KNN (K nearest neighbor) or NN (nearest neighbor) classifier to predicate the final type. Most of the metrics are learned using linear dimensionality reduction algorithms like Principle Components Analysis (PCA) and Linear Discriminant Analysis (LDA). Such metrics are common to all the proteins in the dataset. In fact, they assume that the proteins lie on a uniform distribution, which can be captured by the linear dimensionality reduction algorithm. We doubt this assumption, and learn local metrics which are optimized for local subset of the whole proteins. The learning procedure is iterated with the protein clustering. Then a novel ensemble distance metric is given by combining the local metrics through Tikhonov regularization. The experimental results on a benchmark dataset demonstrate the feasibility and effectiveness of the proposed algorithm named ProClusEnsem. © 2012 Elsevier Ltd.en
dc.description.sponsorshipThe study was supported by grants from Shanghai Key Laboratory of Intelligent Information Processing, China (Grant No. IIPL-2011-003), Key Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China (Grant No. HS201107), National Grand Fundamental Research (973) Program of China (Grant Nos. 2010CB834303 and 2011CB911102), National Natural Science Foundation of China (Grant No. 60973154), Hubei Provincial Science Foundation, China (Grant Nos. 2010CDA006 and 2010CD06601), and a grant from King Abdullah University of Science and Technology.en
dc.publisherElsevier BVen
dc.subjectClusteringen
dc.subjectDistance metricen
dc.subjectLinear dimensionality reductionen
dc.subjectMembrane protein types predictionen
dc.subjectPseudo amino acid compositionen
dc.subjectTikhonov regularizationen
dc.titleProClusEnsem: Predicting membrane protein types by fusing different modes of pseudo amino acid compositionen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.identifier.journalComputers in Biology and Medicineen
dc.contributor.institutionShanghai Institute of Applied Physics, Chinese Academy of Science, 2019 Jialuo Road, Jiading District, Shanghai 201800, Chinaen
dc.contributor.institutionShanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, Chinaen
dc.contributor.institutionDepartment of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, Chinaen
dc.contributor.institutionKey Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China, College of Mathematics and Computer Science, Hunan Normal University, Changsha, Hunan 410081, Chinaen
dc.contributor.institutionDepartment of Biomedical Engineering, Oregon Health and Science University, 20000 NW Walker Rd., Beaverton, OR 97006, United Statesen
kaust.authorWang, Jim Jing-Yanen
kaust.authorGao, Xinen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.