Codon Deviation Coefficient: A novel measure for estimating codon usage bias and its statistical significance
Article - Full Text
Supplemental File 1
Supplemental File 2
Supplemental File 3
KAUST DepartmentBiological and Environmental Sciences and Engineering (BESE) Division
Computational Bioscience Research Center (CBRC)
Desert Agriculture Initiative
Online Publication Date2012-03-23
Print Publication Date2012
Permanent link to this recordhttp://hdl.handle.net/10754/325470
MetadataShow full item record
AbstractBackground: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions. 2012 Zhang et al; licensee BioMed Central Ltd.
CitationZhang Z, Li J, Cui P, Ding F, Li A, et al. (2012) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 13: 43. doi:10.1186/1471-2105-13-43.
PubMed Central IDPMC3368730
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone.
- Authors: Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R
- Issue date: 2015 May 14
- Constraint on di-nucleotides by codon usage bias in bacterial genomes.
- Authors: Satapathy SS, Powdel BR, Dutta M, Buragohain AK, Ray SK
- Issue date: 2014 Feb 15
- Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift.
- Authors: Shah P, Gilchrist MA
- Issue date: 2011 Jun 21
- Translational Selection for Speed Is Not Sufficient to Explain Variation in Bacterial Codon Usage Bias.
- Authors: Mahajan S, Agashe D
- Issue date: 2018 Feb 1
- scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes.
- Authors: O'Neill PK, Or M, Erill I
- Issue date: 2013
Showing items related by title, author, creator and subject.
Modeling compositional dynamics based on GC and purine contents of protein-coding sequencesZhang, Zhang; Yu, Jun (Biology Direct, Springer Nature, 2010-11-09) [Article]Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.
Heritability in the efficiency of nonsense-mediated mRNA decay in humansSeoighe, Cathal; Gehring, Christoph A (PLoS ONE, Public Library of Science (PLoS), 2010-07-21) [Article]Background: In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences. Principal Findings: Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the DCP1A gene, which forms part of the decapping complex, involved in NMD. Conclusions: While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3. © 2010 Seoighe, Gehring.
Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomesQu, Hongzhu; Wu, Hao; Zhang, Tongwu; Zhang, Zhang; Hu, Songnian; Yu, Jun (Research in Microbiology, Elsevier BV, 2010-12) [Article]Nucleotide compositional asymmetry (NCA) between leading and lagging strands (LeS and LaS) is dynamic and diverse among eubacterial genomes due to different mutation and selection forces. A thorough investigation is needed in order to study the relationship between nucleotide composition dynamics and gene distribution biases. Based on a collection of 364 eubacterial genomes that were grouped according to a DnaE-based scheme (DnaE1-DnaE1, DnaE2-DnaE1, and DnaE3-PolC), we investigated NCA and nucleotide composition gradients at three codon positions and found that there was universal G-enrichment on LeS among all groups. This was due to a strong selection for G-heading (codon position1 or cp1) codons and mutation pressure that led to more G-ending (cp3) codons. Moreover, a slight T-enrichment of LeS due to the mutation of cytosine deamination at cp3 was universal among DnaE1-DnaE1 and DnaE2-DnaE1 genomes, but was not clearly seen among DnaE3-PolC genomes, in which A-enrichment of LeS was proposed to be the effect of selections unique to polC and a mutation bias toward A-richness at cp1 that may be a result of transcription-coupled DNA repair mechanisms. Furthermore, strand-biased gene distribution enhances the purine-richness of LeS for DnaE3-PolC genomes and T-richness of LeS for DnaE1-DnaE1 and DnaE2-dnaE1 genomes. © 2010 Institut Pasteur.