Locus Reference Genomic sequences: An improved basis for describing human DNA variants
Tully, Raymond E
McLaren, William M
Vaughan, Brendan W
Taschner, Peter EM
den Dunnen, Johan T
Brookes, Anthony J
Maglott, Donna R
KAUST DepartmentComputational Bioscience Research Center (CBRC)
Permanent link to this recordhttp://hdl.handle.net/10754/325274
MetadataShow full item record
AbstractAs our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.
CitationDalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, et al. (2010) Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Medicine 2: 24. doi:10.1186/gm145.
PubMed Central IDPMC2873802
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants.
- Authors: MacArthur JA, Morales J, Tully RE, Astashyn A, Gil L, Bruford EA, Larsson P, Flicek P, Dalgleish R, Maglott DR, Cunningham F
- Issue date: 2014 Jan
- [Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].
- Authors: Zhang DL, Ji L, Li YD
- Issue date: 2004 May
- Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.
- Authors: Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PE
- Issue date: 2008 Jan
- Describing Sequence Variants Using HGVS Nomenclature.
- Authors: den Dunnen JT
- Issue date: 2017
- Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing.
- Authors: Tikkanen T, Leroy B, Fournier JL, Risques RA, Malcikova J, Soussi T
- Issue date: 2018 Jul
Showing items related by title, author, creator and subject.
Population structure of Atlantic Mackerel inferred from RAD-seq derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selectionRodríguez-Ezpeleta, Naiara; Bradbury, Ian R.; Mendibil, Iñaki; Álvarez, Paula; Cotano, Unai; Irigoien, Xabier (Molecular Ecology Resources, Wiley-Blackwell, 2016-03-03) [Article]Restriction-site associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in non-model organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.
Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolatesBlack, PA; de Vos, M.; Louw, GE; van der Merwe, RG; Dippenaar, A.; Streicher, EM; Abdallah, A. M.; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; van Helden, PD; Warren, RM; Pain, Arnab (BMC Genomics, Springer Nature, 2015-10-24) [Article]Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity within an M. tuberculosis population and showed that genetic diversity may be re-defined when a selective pressure, such as drug exposure, is imposed on M. tuberculosis populations during the course of infection. This suggests that the genome of M. tuberculosis is more dynamic than previously thought, suggesting preparedness to respond to a changing environment.