Viral metagenomics: Analysis of begomoviruses by illumina high-throughput sequencing
Piatek, Marek J.
Brown, Judith K.
KAUST DepartmentBioscience Core Lab
Biological and Environmental Sciences and Engineering (BESE) Division
Desert Agriculture Initiative
Permanent link to this recordhttp://hdl.handle.net/10754/325365
MetadataShow full item record
AbstractTraditional DNA sequencing methods are inefficient, lack the ability to discern the least abundant viral sequences, and ineffective for determining the extent of variability in viral populations. Here, populations of single-stranded DNA plant begomoviral genomes and their associated beta- and alpha-satellite molecules (virus-satellite complexes) (genus, Begomovirus; family, Geminiviridae) were enriched from total nucleic acids isolated from symptomatic, field-infected plants, using rolling circle amplification (RCA). Enriched virus-satellite complexes were subjected to Illumina-Next Generation Sequencing (NGS). CASAVA and SeqMan NGen programs were implemented, respectively, for quality control and for de novo and reference-guided contig assembly of viral-satellite sequences. The authenticity of the begomoviral sequences, and the reproducibility of the Illumina-NGS approach for begomoviral deep sequencing projects, were validated by comparing NGS results with those obtained using traditional molecular cloning and Sanger sequencing of viral components and satellite DNAs, also enriched by RCA or amplified by polymerase chain reaction. As the use of NGS approaches, together with advances in software development, make possible deep sequence coverage at a lower cost; the approach described herein will streamline the exploration of begomovirus diversity and population structure from naturally infected plants, irrespective of viral abundance. This is the first report of the implementation of Illumina-NGS to explore the diversity and identify begomoviral-satellite SNPs directly from plants naturally-infected with begomoviruses under field conditions. 2014 by the authors; licensee MDPI, Basel, Switzerland.
CitationIdris A, Al-Saleh M, Piatek M, Al-Shahwan I, Ali S, et al. (2014) Viral Metagenomics: Analysis of Begomoviruses by Illumina High-Throughput Sequencing. Viruses 6: 1219-1236. doi:10.3390/v6031219.
PubMed Central IDPMC3970147
- Reconstruction and Characterization of Full-Length Begomovirus and Alphasatellite Genomes Infecting Pepper through Metagenomics.
- Authors: Bornancini VA, Irazoqui JM, Flores CR, Vaghi Medina CG, Amadio AF, López Lambertini PM
- Issue date: 2020 Feb 11
- Begomovirus-Associated Satellite DNA Diversity Captured Through Vector-Enabled Metagenomic (VEM) Surveys Using Whiteflies (Aleyrodidae).
- Authors: Rosario K, Marr C, Varsani A, Kraberger S, Stainton D, Moriones E, Polston JE, Breitbart M
- Issue date: 2016 Feb 2
- Metagenomics of Neotropical Single-Stranded DNA Viruses in Tomato Cultivars with and without the <i>Ty</i>-1 Gene.
- Authors: de Nazaré Almeida Dos Reis L, Fonseca MEN, Ribeiro SG, Naito FYB, Boiteux LS, Pereira-Carvalho RC
- Issue date: 2020 Jul 28
- The first DNA 1-like alpha satellites in association with New World begomoviruses in natural infections.
- Authors: Paprotka T, Metzler V, Jeske H
- Issue date: 2010 Sep 1
- A melting pot of Old World begomoviruses and their satellites infecting a collection of Gossypium species in Pakistan.
- Authors: Nawaz-ul-Rehman MS, Briddon RW, Fauquet CM
- Issue date: 2012
Showing items related by title, author, creator and subject.
Next-Generation Sequencing at High Sequencing Depth as a Tool to Study the Evolution of Metastasis Driven by Genetic Change Events of Lung Squamous Cell CarcinomaMansour, Hicham; Ouhajjou, Abdelhak; Bajic, Vladimir B.; Incitti, Roberto (Frontiers in Oncology, Frontiers Media SA, 2020-08-05) [Article]Background: The aim of this study is to report tumoral genetic mutations observed at high sequencing depth in a lung squamous cell carcinoma (SqCC) sample. We describe the findings and differences in genetic mutations that were studied by deep next-generation sequencing methods on the primary tumor and liver metastasis samples. In this report, we also discuss how these differences may be involved in determining the tumor progression leading to the metastasis stage. Methods: We followed one lung SqCC patient who underwent FDG-PET scan imaging, before and after three months of treatment. We sequenced 26 well-known cancer-related genes, at an average of ~6,000 × sequencing coverage, in two spatially distinct regions, one from a primary lung tumor metastasis and the other from a distal liver metastasis, which was present before the treatment. Results: A total of 3,922,196 read pairs were obtained across all two samples' sequenced locations. Merged mapped reads showed several variants, from which we selected 36 with high confidence call. While we found 83% of genetic concordance between the distal metastasis and primary tumor, six variants presented substantial discordance. In the liver metastasis sample, we observed three de novo genetic changes, two on the FGFR3 gene and one on the CDKN2A gene, and the frequency of one variant found on the FGFR2 gene has been increased. Two genetic variants in the HRAS gene, which were present initially in the primary tumor, have been completely lost in the liver tumor. The discordant variants have coding consequences as follows: FGFR3 (c.746C>G, p. Ser249Cys), CDKN2A (c.47_50delTGGC, p. Leu16Profs*9), and HRAS (c.182A>C, p. Gln61Pro). The pathogenicity prediction scores for the acquired variants, assessed using several databases, reported these variants as pathogenic, with a gain of function for FGFR3 and a loss of function for CDKN2A. The patient follow-up using imaging with 18F-FDG PET/CT before and after four cycles of treatment shows discordant tumor progression in metastatic liver compared to primary lung tumor. Conclusions: Our results report the occurrence of several genetic changes between primary tumor and distant liver metastasis in lung SqCC, among which non-silent mutations may be associated with tumor evolution during metastasis.
Data for : Poly(A) Dataset for PAS sequences and pseudo-PAS sequences Classification (fasta format)Albalawi, Fahad; Chahid, Abderrazak; Guo, Xingang; Albaradei, Somayah; Magana-Mora, Arturo; Jankovic, Boris R.; Uludag, Mahmut; Van Neste, Christophe; Essack, Magbubah; Laleg-Kirati, Taous-Meriem; Bajic, Vladimir B. (KAUST Research Repository, 2018-11-15) [Dataset]This Dataset contains DNA sequences of the human genome hg38 from GENCODE folder at EBI ftp server (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) A-Positive set (PAS sequences) Using GENCODE annotation for poly(A) (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.polyAs.gff3.gz) We selected poly(A) signal annotation. Using bedtools-slop option, we found regions extended 300 bp upstream and 300 bp downstream of the poly(A) hexamer. With the bedtools-getfasta option, we extracted 606 bp fasta sequences from these regions. After eliminating duplicates, we obtained 37’516 presumed true functional poly(A) signal (PAS) sequences. Sequences from this set will be denoted as positive. B- Negative set (pseudo-PAS sequences) For the negative set, we looked for regions extended outside the region covering 1’000 bp upstream and downstream of the positive poly(A) hexamer signal using bedtools-complement. Homer tool was used to find matches for the 12 most frequent human poly(A) variants. Since the number of matches was huge, sampling was used to select 37’516 pseudo-PAS sequences. Sampling was done from each chromosome proportionally to the lengths of the chromosomes and also to the expected frequency of the poly(A) variants. Out of these predictions, for each PAS hexamer, we selected the same number of pseudo-PAS sequences as in the positive set. Training and testing sets We selected randomly from each of the positive and negative datasets 20% of sequences for the independent test data. The testing set thus consisted of 15’020 sequences. The remaining data represented the training set that consisted of 60’012 sequences. Both datasets are balanced relative to the true PAS and pseudo-PAS sequences.