Modeling compositional dynamics based on GC and purine contents of protein-coding sequences

Handle URI:
http://hdl.handle.net/10754/325263
Title:
Modeling compositional dynamics based on GC and purine contents of protein-coding sequences
Authors:
Zhang, Zhang; Yu, Jun
Abstract:
Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.
KAUST Department:
Biological and Environmental Sciences and Engineering (BESE) Division; Plant Stress Genomics Research Lab
Citation:
Zhang Z, Yu J (2010) Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biology Direct 5: 63. doi:10.1186/1745-6150-5-63.
Publisher:
Springer Nature
Journal:
Biology Direct
Issue Date:
8-Nov-2010
DOI:
10.1186/1745-6150-5-63
PubMed ID:
21059261
PubMed Central ID:
PMC2989939
Type:
Article
ISSN:
17456150
Appears in Collections:
Articles; Biological and Environmental Sciences and Engineering (BESE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorZhang, Zhangen
dc.contributor.authorYu, Junen
dc.date.accessioned2014-08-27T09:43:14Z-
dc.date.available2014-08-27T09:43:14Z-
dc.date.issued2010-11-08en
dc.identifier.citationZhang Z, Yu J (2010) Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biology Direct 5: 63. doi:10.1186/1745-6150-5-63.en
dc.identifier.issn17456150en
dc.identifier.pmid21059261en
dc.identifier.doi10.1186/1745-6150-5-63en
dc.identifier.urihttp://hdl.handle.net/10754/325263en
dc.description.abstractBackground: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.en
dc.language.isoenen
dc.publisherSpringer Natureen
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.urihttp://creativecommons.org/licenses/by/2.0en
dc.subjectArchaeaen
dc.subjectBacteria (microorganisms)en
dc.subjectEukaryotaen
dc.subjectamino aciden
dc.subjectpurineen
dc.subjectpurine derivativeen
dc.subjectcodonen
dc.subjectDNA base compositionen
dc.subjectgeneticsen
dc.subjectmetabolismen
dc.subjectopen reading frameen
dc.subjectAmino Acidsen
dc.subjectBase Compositionen
dc.subjectCodonen
dc.subjectOpen Reading Framesen
dc.subjectPurinesen
dc.titleModeling compositional dynamics based on GC and purine contents of protein-coding sequencesen
dc.typeArticleen
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Divisionen
dc.contributor.departmentPlant Stress Genomics Research Laben
dc.identifier.journalBiology Directen
dc.identifier.pmcidPMC2989939en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.institutionUnidad Académica de Sistemas Arrecifales (Puerto Morelos), Instituto de Ciencias Del Mar y Limnología, Universidad Nacional Autõnoma de México, Puerto Morelos, QR 77580, Mexicoen
dc.contributor.institutionSchool of Natural Sciences, University of California Merced, 5200 North Lake Road, Merced, CA 95343, United Statesen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorZhang, Zhangen
kaust.authorYu, Junen
This item is licensed under a Creative Commons License
Creative Commons
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.