KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Preprint Posting Date2012-08-18
Online Publication Date2012-11-19
Print Publication Date2012
Permanent link to this recordhttp://hdl.handle.net/10754/325469
MetadataShow full item record
AbstractBackground: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
CitationWang J, Bensmail H, Gao X (2012) Multiple graph regularized protein domain ranking. BMC Bioinformatics 13: 307. doi:10.1186/1471-2105-13-307.
PubMed Central IDPMC3583823
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Automatic classification of protein structures relying on similarities between alignments.
- Authors: Santini G, Soldano H, Pothier J
- Issue date: 2012 Sep 14
- CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.
- Authors: Cheng W, Guo Z, Zhang X, Wang W
- Issue date: 2016 Jul
- AliWABA: alignment on the web through an A-Bruijn approach.
- Authors: Jones NC, Zhi D, Raphael BJ
- Issue date: 2006 Jul 1
- Detection of distant structural similarities in a set of proteins using a fast graph-based method.
- Authors: Koch I, Lengauer T
- Issue date: 1997
- S4: structure-based sequence alignments of SCOP superfamilies.
- Authors: Casbon J, Saqi MA
- Issue date: 2005 Jan 1
Showing items related by title, author, creator and subject.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure predictionCui, Xuefeng; Lu, Zhiwu; wang, sheng; Wang, Jim Jing-Yan; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2016-06-15) [Article]Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.
Protein-protein interactions decoys datasets for machine learning algorithm developmentBarradas Bautista, Didier; Almajed, Ali; Cavallo, Luigi; Kalnis, Panos; Oliva, Romina (KAUST Research Repository, 2021-01-20) [Dataset]This is the most complete and diverse protein docking decoys set derived from the Benchmark5, Scorers_set. We used three different rigid-body docking programs to generate the decoys for the Bechmark5. We analyzed all docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). We provide a balanced and unbalanced version of the data. This balanced data is intended for the training and test of machine learning algorithms. the unbalanced data is provided to simulated the real-world scenario. We also provide a set of rigid-body docking decoys from Interactome3D that spans 1391 interactions. We obtained the labels for this set using a weakly-supervised approach we called hAIkal. We used this data to augment the train data and improve machine learning classifiers.
A multi-directional rapidly exploring random graph (mRRG) for protein foldingNath, Shuvra Kanti; Thomas, Shawna; Ekenna, Chinwe; Amato, Nancy M. (Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB '12, Association for Computing Machinery (ACM), 2012) [Conference Paper]Modeling large-scale protein motions, such as those involved in folding and binding interactions, is crucial to better understanding not only how proteins move and interact with other molecules but also how proteins misfold, thus causing many devastating diseases. Robotic motion planning algorithms, such as Rapidly Exploring Random Trees (RRTs), have been successful in simulating protein folding pathways. Here, we propose a new multi-directional Rapidly Exploring Random Graph (mRRG) specifically tailored for proteins. Unlike traditional RRGs which only expand a parent conformation in a single direction, our strategy expands the parent conformation in multiple directions to generate new samples. Resulting samples are connected to the parent conformation and its nearest neighbors. By leveraging multiple directions, mRRG can model the protein motion landscape with reduced computational time compared to several other robotics-based methods for small to moderate-sized proteins. Our results on several proteins agree with experimental hydrogen out-exchange, pulse-labeling, and F-value analysis. We also show that mRRG covers the conformation space better as compared to the other computation methods. Copyright © 2012 ACM.