Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories

Handle URI:
http://hdl.handle.net/10754/325468
Title:
Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories
Authors:
Chikalov, Igor; Yao, Peggy; Moshkov, Mikhail ( 0000-0003-0085-9483 ) ; Latombe, Jean-Claude
Abstract:
Background: Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. They form and break while a protein deforms, for instance during the transition from a non-functional to a functional state. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor.Methods: This paper describes inductive learning methods to train protein-independent probabilistic models of H-bond stability from molecular dynamics (MD) simulation trajectories of various proteins. The training data contains 32 input attributes (predictors) that describe an H-bond and its local environment in a conformation c and the output attribute is the probability that the H-bond will be present in an arbitrary conformation of this protein achievable from c within a time duration ?. We model dependence of the output variable on the predictors by a regression tree.Results: Several models are built using 6 MD simulation trajectories containing over 4000 distinct H-bonds (millions of occurrences). Experimental results demonstrate that such models can predict H-bond stability quite well. They perform roughly 20% better than models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a conformation. In most tests, about 80% of the 10% H-bonds predicted as the least stable are actually among the 10% truly least stable. The important attributes identified during the tree construction are consistent with previous findings.Conclusions: We use inductive learning methods to build protein-independent probabilistic models to study H-bond stability, and demonstrate that the models perform better than H-bond energy alone. 2011 Chikalov et al; licensee BioMed Central Ltd.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Chikalov I, Yao P, Moshkov M, Latombe J-C (2011) Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories. BMC Bioinformatics 12: S34. doi:10.1186/1471-2105-12-S1-S34.
Publisher:
Springer Nature
Journal:
BMC Bioinformatics
Issue Date:
15-Feb-2011
DOI:
10.1186/1471-2105-12-S1-S34
PubMed ID:
21342565
PubMed Central ID:
PMC3044290
Type:
Article
ISSN:
14712105
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorChikalov, Igoren
dc.contributor.authorYao, Peggyen
dc.contributor.authorMoshkov, Mikhailen
dc.contributor.authorLatombe, Jean-Claudeen
dc.date.accessioned2014-08-27T09:52:39Z-
dc.date.available2014-08-27T09:52:39Z-
dc.date.issued2011-02-15en
dc.identifier.citationChikalov I, Yao P, Moshkov M, Latombe J-C (2011) Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories. BMC Bioinformatics 12: S34. doi:10.1186/1471-2105-12-S1-S34.en
dc.identifier.issn14712105en
dc.identifier.pmid21342565en
dc.identifier.doi10.1186/1471-2105-12-S1-S34en
dc.identifier.urihttp://hdl.handle.net/10754/325468en
dc.description.abstractBackground: Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. They form and break while a protein deforms, for instance during the transition from a non-functional to a functional state. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor.Methods: This paper describes inductive learning methods to train protein-independent probabilistic models of H-bond stability from molecular dynamics (MD) simulation trajectories of various proteins. The training data contains 32 input attributes (predictors) that describe an H-bond and its local environment in a conformation c and the output attribute is the probability that the H-bond will be present in an arbitrary conformation of this protein achievable from c within a time duration ?. We model dependence of the output variable on the predictors by a regression tree.Results: Several models are built using 6 MD simulation trajectories containing over 4000 distinct H-bonds (millions of occurrences). Experimental results demonstrate that such models can predict H-bond stability quite well. They perform roughly 20% better than models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a conformation. In most tests, about 80% of the 10% H-bonds predicted as the least stable are actually among the 10% truly least stable. The important attributes identified during the tree construction are consistent with previous findings.Conclusions: We use inductive learning methods to build protein-independent probabilistic models to study H-bond stability, and demonstrate that the models perform better than H-bond energy alone. 2011 Chikalov et al; licensee BioMed Central Ltd.en
dc.language.isoenen
dc.publisherSpringer Natureen
dc.rightsThis is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en
dc.subjectInductive learning methodsen
dc.subjectIntrinsic strengthen
dc.subjectLocal environmentsen
dc.subjectMolecular dynamics simulationsen
dc.subjectOutput variablesen
dc.subjectProbabilistic modelsen
dc.subjectProtein structuresen
dc.subjectTree constructionen
dc.subjectRegression treesen
dc.subjectForestryen
dc.subjectHydrogen bondsen
dc.subjectLearning systemsen
dc.subjectMolecular dynamicsen
dc.subjectProteinsen
dc.subjectStabilityen
dc.subjectTrajectoriesen
dc.subjectBioinformaticsen
dc.subjectComputer simulationen
dc.subjectproteinen
dc.subjectalgorithmen
dc.subjectchemistryen
dc.subjecthydrogen bonden
dc.subjectmolecular dynamicsen
dc.subjectprotein secondary structureen
dc.subjectprotein stabilityen
dc.subjectstatistical modelen
dc.subjectAlgorithmsen
dc.subjectHydrogen Bondingen
dc.subjectModels, Statisticalen
dc.subjectMolecular Dynamics Simulationen
dc.subjectProtein Stabilityen
dc.subjectProtein Structure, Secondaryen
dc.subjectProteinsen
dc.titleLearning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectoriesen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBMC Bioinformaticsen
dc.identifier.pmcidPMC3044290en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionBiomedical Informatics, Stanford University, Stanford, CA 94305, United Statesen
dc.contributor.institutionComputer Science Department, Stanford University, Stanford, CA 94305, United Statesen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorChikalov, Igoren
kaust.authorMoshkov, Mikhailen

Related articles on PubMed

This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.