Locus Reference Genomic sequences: An improved basis for describing human DNA variants
Tully, Raymond E
McLaren, William M
Vaughan, Brendan W
Taschner, Peter EM
den Dunnen, Johan T
Brookes, Anthony J
Maglott, Donna R
KAUST DepartmentComputational Bioscience Research Center (CBRC)
Online Publication Date2010-04-15
Print Publication Date2010
Permanent link to this recordhttp://hdl.handle.net/10754/325274
MetadataShow full item record
AbstractAs our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.
CitationDalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, et al. (2010) Locus Reference Genomic sequences: an improved basis for describing human DNA variants. Genome Medicine 2: 24. doi:10.1186/gm145.
PubMed Central IDPMC2873802
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- Locus Reference Genomic: reference sequences for the reporting of clinically relevant sequence variants.
- Authors: MacArthur JA, Morales J, Tully RE, Astashyn A, Gil L, Bruford EA, Larsson P, Flicek P, Dalgleish R, Maglott DR, Cunningham F
- Issue date: 2014 Jan
- [Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].
- Authors: Zhang DL, Ji L, Li YD
- Issue date: 2004 May
- Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker.
- Authors: Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PE
- Issue date: 2008 Jan
- Describing Sequence Variants Using HGVS Nomenclature.
- Authors: den Dunnen JT
- Issue date: 2017
- Describing structural changes by extending HGVS sequence variation nomenclature.
- Authors: Taschner PE, den Dunnen JT
- Issue date: 2011 May
Showing items related by title, author, creator and subject.
Data for : Poly(A) Dataset for PAS sequences and pseudo-PAS sequences Classification (fasta format)Albalawi, Fahad; Chahid, Abderrazak; Guo, Xingang; Albaradei, Somayah; Magana-Mora, Arturo; Jankovic, Boris R.; Uludag, Mahmut; Van Neste, Christophe; Essack, Magbubah; Laleg-Kirati, Taous-Meriem; Bajic, Vladimir B. (2018-11-15) [Dataset]This Dataset contains DNA sequences of the human genome hg38 from GENCODE folder at EBI ftp server (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) A-Positive set (PAS sequences) Using GENCODE annotation for poly(A) (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.polyAs.gff3.gz) We selected poly(A) signal annotation. Using bedtools-slop option, we found regions extended 300 bp upstream and 300 bp downstream of the poly(A) hexamer. With the bedtools-getfasta option, we extracted 606 bp fasta sequences from these regions. After eliminating duplicates, we obtained 37’516 presumed true functional poly(A) signal (PAS) sequences. Sequences from this set will be denoted as positive. B- Negative set (pseudo-PAS sequences) For the negative set, we looked for regions extended outside the region covering 1’000 bp upstream and downstream of the positive poly(A) hexamer signal using bedtools-complement. Homer tool was used to find matches for the 12 most frequent human poly(A) variants. Since the number of matches was huge, sampling was used to select 37’516 pseudo-PAS sequences. Sampling was done from each chromosome proportionally to the lengths of the chromosomes and also to the expected frequency of the poly(A) variants. Out of these predictions, for each PAS hexamer, we selected the same number of pseudo-PAS sequences as in the positive set. Training and testing sets We selected randomly from each of the positive and negative datasets 20% of sequences for the independent test data. The testing set thus consisted of 15’020 sequences. The remaining data represented the training set that consisted of 60’012 sequences. Both datasets are balanced relative to the true PAS and pseudo-PAS sequences.
Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencingIdris, Ali; Al-Saleh, Mohammed; Piatek, Marek J.; Al-Shahwan, Ibrahim; Ali, Shahjahan; Brown, Judith K. (Viruses, MDPI AG, 2014-03-12) [Article]Traditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.