CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

Handle URI:
http://hdl.handle.net/10754/615924
Title:
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction
Authors:
Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.
KAUST Department:
Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Citation:
CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction 2016, 32 (12):i332 Bioinformatics
Publisher:
Oxford University Press (OUP)
Journal:
Bioinformatics
Issue Date:
15-Jun-2016
DOI:
10.1093/bioinformatics/btw271
Type:
Article
ISSN:
1367-4803; 1460-2059
Sponsors:
The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/1976-04, National Natural Science Foundation of China (61573363), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (15XNLQ01), and IBM Global SUR Award Program. This research made use of the resources of the computer clusters at KAUST.
Additional Links:
http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btw271
Appears in Collections:
Articles

Full metadata record

DC FieldValue Language
dc.contributor.authorCui, Xuefengen
dc.contributor.authorLu, Zhiwuen
dc.contributor.authorWang, Shengen
dc.contributor.authorJing-Yan Wang, Jimen
dc.contributor.authorGao, Xinen
dc.date.accessioned2016-07-11T09:35:41Z-
dc.date.available2016-07-11T09:35:41Z-
dc.date.issued2016-06-15-
dc.identifier.citationCMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction 2016, 32 (12):i332 Bioinformaticsen
dc.identifier.issn1367-4803-
dc.identifier.issn1460-2059-
dc.identifier.doi10.1093/bioinformatics/btw271-
dc.identifier.urihttp://hdl.handle.net/10754/615924-
dc.description.abstractMotivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods.en
dc.description.sponsorshipThe research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/1976-04, National Natural Science Foundation of China (61573363), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (15XNLQ01), and IBM Global SUR Award Program. This research made use of the resources of the computer clusters at KAUST.en
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.relation.urlhttp://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btw271en
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.titleCMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure predictionen
dc.typeArticleen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Divisionen
dc.identifier.journalBioinformaticsen
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionBeijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing 100872, Chinaen
dc.contributor.institutionToyota Technological Institute at Chicago, 6045 Kenwood Avenue, Chicago, IL 60637, USAen
dc.contributor.institutionDepartment of Human Genetics, University of Chicago, E. 58th St, Chicago, IL 60637, USAen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorCui, Xuefengen
kaust.authorJing-Yan Wang, Jimen
kaust.authorGao, Xinen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.