3DSwap: Curated knowledgebase of proteins involved in 3D domain swapping
Shingate, Prashant N.
Manjunath, S. C. P.
Online Publication Date2011-09-29
Print Publication Date2011-09-29
Permanent link to this recordhttp://hdl.handle.net/10754/325442
MetadataShow full item record
AbstractThree-dimensional domain swapping is a unique protein structural phenomenon where two or more protein chains in a protein oligomer share a common structural segment between individual chains. This phenomenon is observed in an array of protein structures in oligomeric conformation. Protein structures in swapped conformations perform diverse functional roles and are also associated with deposition diseases in humans. We have performed in-depth literature curation and structural bioinformatics analyses to develop an integrated knowledgebase of proteins involved in 3D domain swapping. The hallmark of 3D domain swapping is the presence of distinct structural segments such as the hinge and swapped regions. We have curated the literature to delineate the boundaries of these regions. In addition, we have defined several new concepts like 'secondary major interface' to represent the interface properties arising as a result of 3D domain swapping, and a new quantitative measure for the 'extent of swapping' in structures. The catalog of proteins reported in 3DSwap knowledgebase has been generated using an integrated structural bioinformatics workflow of database searches, literature curation, by structure visualization and sequence-structure-function analyses. The current version of the 3DSwap knowledgebase reports 293 protein structures, the analysis of such a compendium of protein structures will further the understanding molecular factors driving 3D domain swapping. The Author(s) 2011.
CitationShameer K, Shingate PN, Manjunath SCP, Karthika M, Pugalenthi G, et al. (2011) 3DSwap: curated knowledgebase of proteins involved in 3D domain swapping. Database 2011: bar042-bar042. doi:10.1093/database/bar042.
PublisherOxford University Press (OUP)
PubMed Central IDPMC3294423
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
- 3dswap-pred: prediction of 3D domain swapping from protein sequence using Random Forest approach.
- Authors: Shameer K, Pugalenthi G, Kandaswamy KK, Sowdhamini R
- Issue date: 2011 Oct
- Functional repertoire, molecular pathways and diseases associated with 3D domain swapping in the human proteome.
- Authors: Shameer K, Sowdhamini R
- Issue date: 2012 Apr 3
- 3D domain swapping: as domains continue to swap.
- Authors: Liu Y, Eisenberg D
- Issue date: 2002 Jun
- Three-dimensional domain swapping in the protein structure space.
- Authors: Huang Y, Cao H, Liu Z
- Issue date: 2012 Jun
- Exploring the Roles of Proline in Three-Dimensional Domain Swapping from Structure Analysis and Molecular Dynamics Simulations.
- Authors: Huang Y, Gao M, Su Z
- Issue date: 2018 Feb
Showing items related by title, author, creator and subject.
ProDis-ContSHC: Learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrievalWang, Jim Jing-Yan; Gao, Xin; Wang, Quanquan; Li, Yongping (BMC Bioinformatics, Springer Nature, 2012-05-08) [Article]Background: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database.Results: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N (i) and N (j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N (i) and N (j). Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information.Conclusions: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature. 2012 Wang et al.; licensee BioMed Central Ltd.
The human interactome knowledge base (hint-kb): An integrative human protein interaction database enriched with predicted protein–protein interaction scores using a novel hybrid techniqueTheofilatos, Konstantinos A.; Dimitrakopoulos, Christos M.; Likothanassis, Spiridon D.; Kleftogiannis, Dimitrios A.; Moschopoulos, Charalampos N.; Alexakos, Christos; Papadimitriou, Stergios; Mavroudi, Seferina P. (Artificial Intelligence Review, Springer Nature, 2013-07-12) [Article]Proteins are the functional components of many cellular processes and the identification of their physical protein–protein interactions (PPIs) is an area of mature academic research. Various databases have been developed containing information about experimentally and computationally detected human PPIs as well as their corresponding annotation data. However, these databases contain many false positive interactions, are partial and only a few of them incorporate data from various sources. To overcome these limitations, we have developed HINT-KB (http://biotools.ceid.upatras.gr/hint-kb/), a knowledge base that integrates data from various sources, provides a user-friendly interface for their retrieval, cal-culatesasetoffeaturesofinterest and computesaconfidence score for every candidate protein interaction. This confidence score is essential for filtering the false positive interactions which are present in existing databases, predicting new protein interactions and measuring the frequency of each true protein interaction. For this reason, a novel machine learning hybrid methodology, called (Evolutionary Kalman Mathematical Modelling—EvoKalMaModel), was used to achieve an accurate and interpretable scoring methodology. The experimental results indicated that the proposed scoring scheme outperforms existing computational methods for the prediction of PPIs.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure predictionCui, Xuefeng; Lu, Zhiwu; wang, sheng; Wang, Jim Jing-Yan; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2016-06-15) [Article]Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.