CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction
Type
ArticleKAUST Department
Computational Bioscience Research Center (CBRC)Computer Science Program
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
KAUST Grant Number
URF/1/1976-04Date
2016-06-15Online Publication Date
2016-06-15Print Publication Date
2016-06-15Permanent link to this record
http://hdl.handle.net/10754/615924
Metadata
Show full item recordAbstract
Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.Citation
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction 2016, 32 (12):i332 BioinformaticsSponsors
The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/1976-04, National Natural Science Foundation of China (61573363), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (15XNLQ01), and IBM Global SUR Award Program. This research made use of the resources of the computer clusters at KAUST.Publisher
Oxford University Press (OUP)Journal
BioinformaticsPubMed ID
27307635ae974a485f413a2113503eed53cd6c53
10.1093/bioinformatics/btw271
Scopus Count
Related articles
- FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition.
- Authors: Wang C, Zhang H, Zheng WM, Xu D, Zhu J, Wang B, Ning K, Sun S, Li SC, Bu D
- Issue date: 2016 Feb 1
- Protein threading using residue co-variation and deep learning.
- Authors: Zhu J, Wang S, Bu D, Xu J
- Issue date: 2018 Jul 1
- Fuse: multiple network alignment via data fusion.
- Authors: Gligorijević V, Malod-Dognin N, Pržulj N
- Issue date: 2016 Apr 15
- Incorporating homologues into sequence embeddings for protein analysis.
- Authors: Eskin E, Snir S
- Issue date: 2007 Jun
- Protein-fold recognition using an improved single-source K diverse shortest paths algorithm.
- Authors: Lhota J, Xie L
- Issue date: 2016 Apr