Codon Deviation Coefficient: A novel measure for estimating codon usage bias and its statistical significance
Supplemental File 1
Supplemental File 2
Supplemental File 3
KAUST DepartmentComputational Bioscience Research Center (CBRC)
MetadataShow full item record
AbstractBackground: Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis.Results: Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance.Conclusions: As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions. 2012 Zhang et al; licensee BioMed Central Ltd.
CitationZhang Z, Li J, Cui P, Ding F, Li A, et al. (2012) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics 13: 43. doi:10.1186/1471-2105-13-43.
PubMed Central IDPMC3368730
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone.
- Authors: Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R
- Issue date: 2015 May 14
- Constraint on di-nucleotides by codon usage bias in bacterial genomes.
- Authors: Satapathy SS, Powdel BR, Dutta M, Buragohain AK, Ray SK
- Issue date: 2014 Feb 15
- Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift.
- Authors: Shah P, Gilchrist MA
- Issue date: 2011 Jun 21
- scnRCA: a novel method to detect consistent patterns of translational selection in mutationally-biased genomes.
- Authors: O'Neill PK, Or M, Erill I
- Issue date: 2013
- Codon usage in twelve species of Drosophila.
- Authors: Vicario S, Moriyama EN, Powell JR
- Issue date: 2007 Nov 15
Showing items related by title, author, creator and subject.
Modeling compositional dynamics based on GC and purine contents of protein-coding sequencesZhang, Zhang; Yu, Jun (Springer Nature, 2010-11-08)Background: Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.Results: To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.Conclusions: We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.Reviewers: This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft. 2010 Zhang and Yu; licensee BioMed Central Ltd.
Heritability in the efficiency of nonsense-mediated mRNA decay in humansSeoighe, Cathal; Gehring, Christoph A (Public Library of Science (PLoS), 2010-07-21)Background: In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences. Principal Findings: Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the DCP1A gene, which forms part of the decapping complex, involved in NMD. Conclusions: While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3. © 2010 Seoighe, Gehring.
On the Organizational Dynamics of the Genetic CodeZhang, Zhang; Yu, Jun (Elsevier BV, 2011-06-07)The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides—adenine, thymine, guanine and cytosine—according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.