KAUST DepartmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Computational Bioscience Research Center (CBRC)
MetadataShow full item record
AbstractBackground: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
CitationWang J, Bensmail H, Gao X (2012) Multiple graph regularized protein domain ranking. BMC Bioinformatics 13: 307. doi:10.1186/1471-2105-13-307.
PubMed Central IDPMC3583823
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- ProClust: improved clustering of protein sequences with an extended graph-based approach.
- Authors: Pipenbacher P, Schliep A, Schneckener S, Schönhuth A, Schomburg D, Schrader R
- Issue date: 2002
- Automatic classification of protein structures relying on similarities between alignments.
- Authors: Santini G, Soldano H, Pothier J
- Issue date: 2012 Sep 14
- CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.
- Authors: Cheng W, Guo Z, Zhang X, Wang W
- Issue date: 2016 Jul
- AliWABA: alignment on the web through an A-Bruijn approach.
- Authors: Jones NC, Zhi D, Raphael BJ
- Issue date: 2006 Jul 1
- Detection of distant structural similarities in a set of proteins using a fast graph-based method.
- Authors: Koch I, Lengauer T
- Issue date: 1997
Showing items related by title, author, creator and subject.
Solution Structure of the Tandem Acyl Carrier Protein Domains from a Polyunsaturated Fatty Acid Synthase Reveals Beads-on-a-String ConfigurationTrujillo, Uldaeliz; Vázquez-Rosa, Edwin; Oyola-Robles, Delise; Stagg, Loren J.; Vassallo, David A.; Vega, Irving E.; Arold, Stefan T.; Baerga-Ortiz, Abel (Public Library of Science (PLoS), 2013-02-28)The polyunsaturated fatty acid (PUFA) synthases from deep-sea bacteria invariably contain multiple acyl carrier protein (ACP) domains in tandem. This conserved tandem arrangement has been implicated in both amplification of fatty acid production (additive effect) and in structural stabilization of the multidomain protein (synergistic effect). While the more accepted model is one in which domains act independently, recent reports suggest that ACP domains may form higher oligomers. Elucidating the three-dimensional structure of tandem arrangements may therefore give important insights into the functional relevance of these structures, and hence guide bioengineering strategies. In an effort to elucidate the three-dimensional structure of tandem repeats from deep-sea anaerobic bacteria, we have expressed and purified a fragment consisting of five tandem ACP domains from the PUFA synthase from Photobacterium profundum. Analysis of the tandem ACP fragment by analytical gel filtration chromatography showed a retention time suggestive of a multimeric protein. However, small angle X-ray scattering (SAXS) revealed that the multi-ACP fragment is an elongated monomer which does not form a globular unit. Stokes radii calculated from atomic monomeric SAXS models were comparable to those measured by analytical gel filtration chromatography, showing that in the gel filtration experiment, the molecular weight was overestimated due to the elongated protein shape. Thermal denaturation monitored by circular dichroism showed that unfolding of the tandem construct was not cooperative, and that the tandem arrangement did not stabilize the protein. Taken together, these data are consistent with an elongated beads-on-a-string arrangement of the tandem ACP domains in PUFA synthases, and speak against synergistic biocatalytic effects promoted by quaternary structuring. Thus, it is possible to envision bioengineering strategies which simply involve the artificial linking of multiple ACP domains for increasing the yield of fatty acids in bacterial cultures. 2013 Trujillo et al.
Structural analysis and dimerization profile of the SCAN domain of the pluripotency factor Zfp206Liang, Yu; Huimei Hong, Felicia; Ganesan, Pugalenthi; Jiang, Sizun; Jauch, Ralf; Stanton, Lawrence W.; Kolatkar, Prasanna R. (Oxford University Press (OUP), 2012-06-26)Zfp206 (also named as Zscan10) belongs to the subfamily of C2H2 zinc finger transcription factors, which is characterized by the N-terminal SCAN domain. The SCAN domain mediates self-association and association between the members of SCAN family transcription factors, but the structural basis and selectivity determinants for complex formation is unknown. Zfp206 is important for maintaining the pluripotency of embryonic stem cells presumably by combinatorial assembly of itself or other SCAN family members on enhancer regions. To gain insights into the folding topology and selectivity determinants for SCAN dimerization, we solved the 1.85 crystal structure of the SCAN domain of Zfp206. In vitro binding studies using a panel of 20 SCAN proteins indicate that the SCAN domain Zfp206 can selectively associate with other members of SCAN family transcription factors. Deletion mutations showed that the N-terminal helix 1 is critical for heterodimerization. Double mutations and multiple mutations based on the Zfp206SCAN-Zfp110SCAN model suggested that domain swapped topology is a possible preference for Zfp206SCAN-Zfp110SCAN heterodimer. Together, we demonstrate that the Zfp206SCAN constitutes a protein module that enables C2H2 transcription factor dimerization in a highly selective manner using a domain-swapped interface architecture and identify novel partners for Zfp206 during embryonal development. 2012 The Author(s).
Simplified method to predict mutual interactions of human transcription factors based on their primary structureSchmeier, Sebastian; Jankovic, Boris R.; Bajic, Vladimir B. (Public Library of Science (PLoS), 2011-07-05)Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39% on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account. © 2011 Schmeier et al.