Modeling compositional dynamics based on GC and purine contents of protein-coding sequences
Supplemental File 5
KAUST DepartmentBiological and Environmental Sciences and Engineering (BESE) Division
Plant Stress Genomics Research Lab
MetadataShow full item record
AbstractBackground: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.
CitationZhang Z, Yu J (2010) Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biology Direct 5: 63. doi:10.1186/1745-6150-5-63.
PubMed Central IDPMC2989939
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes.
- Authors: Knight RD, Freeland SJ, Landweber LF
- Issue date: 2001
- Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes.
- Authors: Rayment JH, Forsdyke DR
- Issue date: 2005
- Compositional correlation studies among the three different codon positions in 12 bacterial genomes.
- Authors: Majumdar S, Gupta SK, Sundararajan VS, Ghosh TC
- Issue date: 1999 Dec 9
- Natural selection retains overrepresented out-of-frame stop codons against frameshift peptides in prokaryotes.
- Authors: Tse H, Cai JJ, Tsoi HW, Lam EP, Yuen KY
- Issue date: 2010 Sep 9
- Prokaryotes that grow optimally in acid have purine-poor codons in long open reading frames.
- Authors: Lin FH, Forsdyke DR
- Issue date: 2007 Jan