Towards fully automated structure-based NMR resonance assignment of 15N-labeled proteins from automatically picked peaks

Handle URI:
http://hdl.handle.net/10754/564361
Title:
Towards fully automated structure-based NMR resonance assignment of 15N-labeled proteins from automatically picked peaks
Authors:
Jang, Richard; Gao, Xin ( 0000-0002-7108-3574 ) ; Li, Ming
Abstract:
In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey- Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant. © Copyright 2011, Mary Ann Liebert, Inc.
KAUST Department:
Applied Mathematics and Computational Science Program; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Computational Bioscience Research Center (CBRC); Structural and Functional Bioinformatics Group
Publisher:
Mary Ann Liebert Inc
Journal:
Journal of Computational Biology
Issue Date:
Mar-2011
DOI:
10.1089/cmb.2010.0251
PubMed ID:
21385039
Type:
Article
ISSN:
10665277
Sponsors:
We would like to thank Xiong, Pandurangan, and Bailey-Kellogg for providing us with their program and the test data for five proteins. We would like to thank our collegues Babak Alipanahi, Frank Balbach, Dongbo Bu, Thorsten Dieckmann, Logan Donaldson, Emre Karakoc, and Shuai Cheng Li for thoughtful discussions. This work is partially supported by NSERC (Grant OGP0046506), China's MOST 863 (Grant 2008AA02Z313), Canada Research Chair program, MITACS, an NSERC Collaborative Grant, Premier's Discovery Award, SHARCNET, Cheriton Scholarship, and a grant from King Adbullah University of Science and Technology.
Appears in Collections:
Articles; Applied Mathematics and Computational Science Program; Structural and Functional Bioinformatics Group; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorJang, Richarden
dc.contributor.authorGao, Xinen
dc.contributor.authorLi, Mingen
dc.date.accessioned2015-08-04T06:24:56Zen
dc.date.available2015-08-04T06:24:56Zen
dc.date.issued2011-03en
dc.identifier.issn10665277en
dc.identifier.pmid21385039en
dc.identifier.doi10.1089/cmb.2010.0251en
dc.identifier.urihttp://hdl.handle.net/10754/564361en
dc.description.abstractIn NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey- Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant. © Copyright 2011, Mary Ann Liebert, Inc.en
dc.description.sponsorshipWe would like to thank Xiong, Pandurangan, and Bailey-Kellogg for providing us with their program and the test data for five proteins. We would like to thank our collegues Babak Alipanahi, Frank Balbach, Dongbo Bu, Thorsten Dieckmann, Logan Donaldson, Emre Karakoc, and Shuai Cheng Li for thoughtful discussions. This work is partially supported by NSERC (Grant OGP0046506), China's MOST 863 (Grant 2008AA02Z313), Canada Research Chair program, MITACS, an NSERC Collaborative Grant, Premier's Discovery Award, SHARCNET, Cheriton Scholarship, and a grant from King Adbullah University of Science and Technology.en
dc.publisherMary Ann Liebert Incen
dc.subjectInteger programmingen
dc.subjectNMRen
dc.subjectpeak pickingen
dc.subjectprotein structureen
dc.subjectresonance assignmenten
dc.titleTowards fully automated structure-based NMR resonance assignment of 15N-labeled proteins from automatically picked peaksen
dc.typeArticleen
dc.contributor.departmentApplied Mathematics and Computational Science Programen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.contributor.departmentStructural and Functional Bioinformatics Groupen
dc.identifier.journalJournal of Computational Biologyen
dc.contributor.institutionDavid R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canadaen
kaust.authorGao, Xinen

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.