ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
MetadataShow full item record
AbstractBackground: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.Conclusions: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature. 2012 Wang et al.; licensee BioMed Central Ltd.
CitationWang J, Gao X, Wang Q, Li Y (2012) ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 13: S2. doi:10.1186/1471-2105-13-S7-S2.
PubMed Central IDPMC3348016
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- A similarity learning approach to content-based image retrieval: application to digital mammography.
- Authors: El-Naqa I, Yang Y, Galatsanos NP, Nishikawa RM, Wernick MN
- Issue date: 2004 Oct
- A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval.
- Authors: Yang L, Jin R, Mummert L, Sukthankar R, Goode A, Zheng B, Hoi SC, Satyanarayanan M
- Issue date: 2010 Jan
- Benchmarking protein classification algorithms via supervised cross-validation.
- Authors: Kertész-Farkas A, Dhir S, Sonego P, Pacurar M, Netoteia S, Nijveen H, Kuzniar A, Leunissen JA, Kocsor A, Pongor S
- Issue date: 2008 Apr 24
- SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.
- Authors: Melvin I, Ie E, Kuang R, Weston J, Stafford WN, Leslie C
- Issue date: 2007 May 22
- Protein ranking by semi-supervised network propagation.
- Authors: Weston J, Kuang R, Leslie C, Noble WS
- Issue date: 2006 Mar 20
Showing items related by title, author, creator and subject.
Interaction between the triglyceride lipase ATGL and the arf1 activator GBF1Ellong, Emy Njoh; Soni, Krishnakant G.; Bui, Quynh-Trang; Sougrat, Rachid; Golinelli-Cohen, Marie-Pierre; Jackson, Catherine L. (Public Library of Science (PLoS), 2011-07-18)The Arf1 exchange factor GBF1 (Golgi Brefeldin A resistance factor 1) and its effector COPI are required for delivery of ATGL (adipose triglyceride lipase) to lipid droplets (LDs). Using yeast two hybrid, co-immunoprecipitation in mammalian cells and direct protein binding approaches, we report here that GBF1 and ATGL interact directly and in cells, through multiple contact sites on each protein. The C-terminal region of ATGL interacts with N-terminal domains of GBF1, including the catalytic Sec7 domain, but not with full-length GBF1 or its entire N-terminus. The N-terminal lipase domain of ATGL (called the patatin domain) interacts with two C-terminal domains of GBF1, HDS (Homology downstream of Sec7) 1 and HDS2. These two domains of GBF1 localize to lipid droplets when expressed alone in cells, but not to the Golgi, unlike the full-length GBF1 protein, which localizes to both. We suggest that interaction of GBF1 with ATGL may be involved in the membrane trafficking pathway mediated by GBF1, Arf1 and COPI that contributes to the localization of ATGL to lipid droplets.
Dissecting the interactions of SERRATE with RNA and DICER-LIKE 1 in Arabidopsis microRNA precursor processingIwata, Yuji; Takahashi, Masateru; Fedoroff, Nina V.; Hamdan, Samir (Oxford University Press (OUP), 2013-08-05)Efficient and precise microRNA (miRNA) biogenesis in Arabidopsis is mediated by the RNaseIII-family enzyme DICER-LIKE 1 (DCL1), double-stranded RNA-binding protein HYPONASTIC LEAVES 1 and the zinc-finger (ZnF) domain-containing protein SERRATE (SE). In the present study, we examined primary miRNA precursor (pri-miRNA) processing by highly purified recombinant DCL1 and SE proteins and found that SE is integral to pri-miRNA processing by DCL1. SE stimulates DCL1 cleavage of the pri-miRNA in an ionic strength-dependent manner. SE uses its N-terminal domain to bind to RNA and requires both N-terminal and ZnF domains to bind to DCL1. However, when DCL1 is bound to RNA, the interaction with the ZnF domain of SE becomes indispensible and stimulates the activity of DCL1 without requiring SE binding to RNA. Our results suggest that the interactions among SE, DCL1 and RNA are a potential point for regulating pri-miRNA processing. 2013 The Author(s) 2013.
Solution Structure of the Tandem Acyl Carrier Protein Domains from a Polyunsaturated Fatty Acid Synthase Reveals Beads-on-a-String ConfigurationTrujillo, Uldaeliz; Vázquez-Rosa, Edwin; Oyola-Robles, Delise; Stagg, Loren J.; Vassallo, David A.; Vega, Irving E.; Arold, Stefan T.; Baerga-Ortiz, Abel (Public Library of Science (PLoS), 2013-02-28)The polyunsaturated fatty acid (PUFA) synthases from deep-sea bacteria invariably contain multiple acyl carrier protein (ACP) domains in tandem. This conserved tandem arrangement has been implicated in both amplification of fatty acid production (additive effect) and in structural stabilization of the multidomain protein (synergistic effect). While the more accepted model is one in which domains act independently, recent reports suggest that ACP domains may form higher oligomers. Elucidating the three-dimensional structure of tandem arrangements may therefore give important insights into the functional relevance of these structures, and hence guide bioengineering strategies. In an effort to elucidate the three-dimensional structure of tandem repeats from deep-sea anaerobic bacteria, we have expressed and purified a fragment consisting of five tandem ACP domains from the PUFA synthase from Photobacterium profundum. Analysis of the tandem ACP fragment by analytical gel filtration chromatography showed a retention time suggestive of a multimeric protein. However, small angle X-ray scattering (SAXS) revealed that the multi-ACP fragment is an elongated monomer which does not form a globular unit. Stokes radii calculated from atomic monomeric SAXS models were comparable to those measured by analytical gel filtration chromatography, showing that in the gel filtration experiment, the molecular weight was overestimated due to the elongated protein shape. Thermal denaturation monitored by circular dichroism showed that unfolding of the tandem construct was not cooperative, and that the tandem arrangement did not stabilize the protein. Taken together, these data are consistent with an elongated beads-on-a-string arrangement of the tandem ACP domains in PUFA synthases, and speak against synergistic biocatalytic effects promoted by quaternary structuring. Thus, it is possible to envision bioengineering strategies which simply involve the artificial linking of multiple ACP domains for increasing the yield of fatty acids in bacterial cultures. 2013 Trujillo et al.