ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

Handle URI:
http://hdl.handle.net/10754/325472
Title:
ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
Authors:
Wang, Jim Jing-Yan; Gao, Xin ( 0000-0002-7108-3574 ) ; Wang, Quanquan; Li, Yongping
Abstract:
Background: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.Conclusions: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature. 2012 Wang et al.; licensee BioMed Central Ltd.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Wang J, Gao X, Wang Q, Li Y (2012) ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 13: S2. doi:10.1186/1471-2105-13-S7-S2.
Publisher:
Springer Nature
Journal:
BMC Bioinformatics
Issue Date:
8-May-2012
DOI:
10.1186/1471-2105-13-S7-S2
PubMed ID:
22594999
PubMed Central ID:
PMC3348016
Type:
Article
ISSN:
14712105
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorWang, Jim Jing-Yanen
dc.contributor.authorGao, Xinen
dc.contributor.authorWang, Quanquanen
dc.contributor.authorLi, Yongpingen
dc.date.accessioned2014-08-27T09:52:52Z-
dc.date.available2014-08-27T09:52:52Z-
dc.date.issued2012-05-08en
dc.identifier.citationWang J, Gao X, Wang Q, Li Y (2012) ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 13: S2. doi:10.1186/1471-2105-13-S7-S2.en
dc.identifier.issn14712105en
dc.identifier.pmid22594999en
dc.identifier.doi10.1186/1471-2105-13-S7-S2en
dc.identifier.urihttp://hdl.handle.net/10754/325472en
dc.description.abstractBackground: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.Conclusions: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature. 2012 Wang et al.; licensee BioMed Central Ltd.en
dc.language.isoenen
dc.publisherSpringer Natureen
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en
dc.subjectBiomedical applicationsen
dc.subjectClass label informationsen
dc.subjectContextual informationen
dc.subjectDissimilarity measuresen
dc.subjectDissimilarity vectorsen
dc.subjectMachine learning communitiesen
dc.subjectRetrieval performanceen
dc.subjectSimilarity measureen
dc.subjectBenchmarkingen
dc.subjectDatabase systemsen
dc.subjectInformation retrievalen
dc.subjectLearning algorithmsen
dc.subjectMedical applicationsen
dc.subjectProteinsen
dc.subjectproteinen
dc.subjectalgorithmen
dc.subjectartificial intelligenceen
dc.subjectchemical modelen
dc.subjectchemistryen
dc.subjectclassificationen
dc.subjectisolation and purificationen
dc.subjectprotein databaseen
dc.subjectAlgorithmsen
dc.subjectArtificial Intelligenceen
dc.subjectDatabases, Proteinen
dc.subjectModels, Chemicalen
dc.subjectProteinsen
dc.titleProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrievalen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBMC Bioinformaticsen
dc.identifier.pmcidPMC3348016en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionShanghai Institute of Applied Physics, Chinese Academy of Sciences, 2019 Jialuo Road, Jiading District, Shanghai 201800, Chinaen
dc.contributor.institutionShanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, Chinaen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorWang, Jim Jing-Yanen
kaust.authorGao, Xinen

Related articles on PubMed

This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.