KAUST DepartmentComputational Bioscience Research Center (CBRC)
Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
Preprint Posting Date2012-08-18
Online Publication Date2012-11-19
Print Publication Date2012
Permanent link to this recordhttp://hdl.handle.net/10754/325469
MetadataShow full item record
AbstractBackground: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
CitationWang J, Bensmail H, Gao X (2012) Multiple graph regularized protein domain ranking. BMC Bioinformatics 13: 307. doi:10.1186/1471-2105-13-307.
PubMed Central IDPMC3583823
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Automatic classification of protein structures relying on similarities between alignments.
- Authors: Santini G, Soldano H, Pothier J
- Issue date: 2012 Sep 14
- ProClust: improved clustering of protein sequences with an extended graph-based approach.
- Authors: Pipenbacher P, Schliep A, Schneckener S, Schönhuth A, Schomburg D, Schrader R
- Issue date: 2002
- CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering.
- Authors: Cheng W, Guo Z, Zhang X, Wang W
- Issue date: 2016 Jul
- AliWABA: alignment on the web through an A-Bruijn approach.
- Authors: Jones NC, Zhi D, Raphael BJ
- Issue date: 2006 Jul 1
- A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
- Authors: Wei W, Gao B, Liu TY, Wang T, Li G, Li H
- Issue date: 2016 Apr
Showing items related by title, author, creator and subject.
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure predictionCui, Xuefeng; Lu, Zhiwu; wang, sheng; Wang, Jim Jing-Yan; Gao, Xin (Bioinformatics, Oxford University Press (OUP), 2016-06-15) [Article]Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.
Protein-protein interactions decoys datasets for machine learning algorithm developmentBarradas Bautista, Didier; Almajed, Ali; Cavallo, Luigi; Kalnis, Panos; Oliva, Romina (KAUST Research Repository, 2021-01-20) [Dataset]This is the most complete and diverse protein docking decoys set derived from the Benchmark5, Scorers_set. We used three different rigid-body docking programs to generate the decoys for the Bechmark5. We analyzed all docking decoys with more than 150 different scoring functions from different sources ( CCharppi, FreeSASA, CIPS, CONSRANK). We provide a balanced and unbalanced version of the data. This balanced data is intended for the training and test of machine learning algorithms. the unbalanced data is provided to simulated the real-world scenario. We also provide a set of rigid-body docking decoys from Interactome3D that spans 1391 interactions. We obtained the labels for this set using a weakly-supervised approach we called hAIkal. We used this data to augment the train data and improve machine learning classifiers.
A Faster Algorithm for Computing Motorcycle GraphsVigneron, Antoine E.; Yan, Lie (Discrete & Computational Geometry, Springer Nature, 2014-08-29) [Article]We present a new algorithm for computing motorcycle graphs that runs in (Formula presented.) time for any (Formula presented.), improving on all previously known algorithms. The main application of this result is to computing the straight skeleton of a polygon. It allows us to compute the straight skeleton of a non-degenerate polygon with (Formula presented.) holes in (Formula presented.) expected time. If all input coordinates are (Formula presented.)-bit rational numbers, we can compute the straight skeleton of a (possibly degenerate) polygon with (Formula presented.) holes in (Formula presented.) expected time. In particular, it means that we can compute the straight skeleton of a simple polygon in (Formula presented.) expected time if all input coordinates are (Formula presented.)-bit rationals, while all previously known algorithms have worst-case running time (Formula presented.). © 2014 Springer Science+Business Media New York.