KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Preprint Posting Date2012-08-18
Online Publication Date2012-11-19
Print Publication Date2012
Permanent link to this recordhttp://hdl.handle.net/10754/325469
MetadataShow full item record
AbstractBackground: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
CitationWang J, Bensmail H, Gao X (2012) Multiple graph regularized protein domain ranking. BMC Bioinformatics 13: 307. doi:10.1186/1471-2105-13-307.
PubMed Central IDPMC3583823
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Automatic classification of protein structures relying on similarities between alignments.
- Authors: Santini G, Soldano H, Pothier J
- Issue date: 2012 Sep 14
- ProClust: improved clustering of protein sequences with an extended graph-based approach.
- Authors: Pipenbacher P, Schliep A, Schneckener S, Schönhuth A, Schomburg D, Schrader R
- Issue date: 2002
- CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.
- Authors: Cheng W, Guo Z, Zhang X, Wang W
- Issue date: 2016 Jul
- AliWABA: alignment on the web through an A-Bruijn approach.
- Authors: Jones NC, Zhi D, Raphael BJ
- Issue date: 2006 Jul 1
- A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
- Authors: Wei W, Gao B, Liu TY, Wang T, Li G, Li H
- Issue date: 2016 Apr
Showing items related by title, author, creator and subject.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure predictionCui, Xuefeng; Lu, Zhiwu; wang, sheng; Wang, Jim Jing-Yan; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2016-06-15) [Article]Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.
Protein-protein interactions decoys datasets for machine learning algorithm developmentBarradas Bautista, Didier; Almajed, Ali; Cavallo, Luigi; Kalnis, Panos; Oliva, Romina (KAUST Research Repository, 2021-01-20) [Dataset]This is the most complete and diverse protein docking decoys set derived from the Benchmark5, Scorers_set. We used three different rigid-body docking programs to generate the decoys for the Bechmark5. We analyzed all docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). We provide a balanced and unbalanced version of the data. This balanced data is intended for the training and test of machine learning algorithms. the unbalanced data is provided to simulated the real-world scenario. We also provide a set of rigid-body docking decoys from Interactome3D that spans 1391 interactions. We obtained the labels for this set using a weakly-supervised approach we called hAIkal. We used this data to augment the train data and improve machine learning classifiers.
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement LearningZhu, Deyao; Li, Li Erran; Elhoseiny, Mohamed (arXiv, 2022-06-09) [Preprint]World models in model-based reinforcement learning usually face unrealistic long-time-horizon prediction issues due to compounding errors as the prediction errors accumulate over timesteps. Recent works in graph-structured world models improve the long-horizon reasoning ability via building a graph to represent the environment, but they are designed in a goal-conditioned setting and cannot guide the agent to maximize episode returns in a traditional reinforcement learning setting without externally given target states. To overcome this limitation, we design a graph-structured world model in offline reinforcement learning by building a directed-graph-based Markov decision process (MDP) with rewards allocated to each directed edge as an abstraction of the original continuous environment. As our world model has small and finite state/action spaces compared to the original environment, value iteration can be easily applied here to estimate state values on the graph and figure out the best future. Unlike previous graph-structured world models that requires externally provided targets, our world model, dubbed Value Memory Graph (VMG), can provide the desired targets with high values by itself. VMG can be used to guide low-level goal-conditioned policies that are trained via supervised learning to maximize episode returns. Experiments on the D4RL benchmark show that VMG can outperform state-of-the-art methods in several tasks where long horizon reasoning ability is crucial. Code will be made publicly available.