Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies
Article - Full Text
Supplemental File 1
Supplemental File 2
Supplemental File 3
Supplemental File 4
KAUST DepartmentBiological and Environmental Sciences and Engineering (BESE) Division
Computational Bioscience Research Center (CBRC)
Permanent link to this recordhttp://hdl.handle.net/10754/325284
MetadataShow full item record
AbstractBackground: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.
CitationCahill MJ, Köser CU, Ross NE, Archer JAC (2010) Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies. PLoS ONE 5: e11518. doi:10.1371/journal.pone.0011518.
PublisherPublic Library of Science (PLoS)
PubMed Central IDPMC2902515
- SeqEntropy: genome-wide assessment of repeats for short read sequencing.
- Authors: Chu HT, Hsiao WW, Tsao TT, Hsu DF, Chen CC, Lee SA, Kao CY
- Issue date: 2013
- Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.
- Authors: Zerbino DR, McEwen GK, Margulies EH, Birney E
- Issue date: 2009 Dec 22
- Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing.
- Authors: Qu W, Hashimoto S, Morishita S
- Issue date: 2009 Jul
- De novo sequencing of plant genomes using second-generation technologies.
- Authors: Imelfort M, Edwards D
- Issue date: 2009 Nov
- 6-10× pyrosequencing is a practical approach for whole prokaryote genome studies.
- Authors: Li J, Jiang J, Leung FC
- Issue date: 2012 Feb 15
Showing items related by title, author, creator and subject.
Characterization and gene expression analysis of the cir multi-gene family of plasmodium chabaudi chabaudi (AS)Lawton, Jennifer; Brugat, Thibaut; Yan, Yam Xue; Reid, Adam James; Böhme, Ulrike; Otto, Thomas Dan; Pain, Arnab; Jackson, Andrew; Berriman, Matthew; Cunningham, Deirdre; Preiser, Peter; Langhorne, Jean (BMC Genomics, Springer Nature, 2012-03-29) [Article]Background: The pir genes comprise the largest multi-gene family in Plasmodium, with members found in P. vivax, P. knowlesi and the rodent malaria species. Despite comprising up to 5% of the genome, little is known about the functions of the proteins encoded by pir genes. P. chabaudi causes chronic infection in mice, which may be due to antigenic variation. In this model, pir genes are called cirs and may be involved in this mechanism, allowing evasion of host immune responses. In order to fully understand the role(s) of CIR proteins during P. chabaudi infection, a detailed characterization of the cir gene family was required.Results: The cir repertoire was annotated and a detailed bioinformatic characterization of the encoded CIR proteins was performed. Two major sub-families were identified, which have been named A and B. Members of each sub-family displayed different amino acid motifs, and were thus predicted to have undergone functional divergence. In addition, the expression of the entire cir repertoire was analyzed via RNA sequencing and microarray. Up to 40% of the cir gene repertoire was expressed in the parasite population during infection, and dominant cir transcripts could be identified. In addition, some differences were observed in the pattern of expression between the cir subgroups at the peak of P. chabaudi infection. Finally, specific cir genes were expressed at different time points during asexual blood stages.Conclusions: In conclusion, the large number of cir genes and their expression throughout the intraerythrocytic cycle of development indicates that CIR proteins are likely to be important for parasite survival. In particular, the detection of dominant cir transcripts at the peak of P. chabaudi infection supports the idea that CIR proteins are expressed, and could perform important functions in the biology of this parasite. Further application of the methodologies described here may allow the elucidation of CIR sub-family A and B protein functions, including their contribution to antigenic variation and immune evasion. 2012 Lawton et al; licensee BioMed Central Ltd.
Transcriptome sequencing of the blind subterranean mole rat, Spalax galili: Utility and potential for the discovery of novel evolutionary patternsMalik, Assaf; Korol, Abraham; Hübner, Sariel; Hernandez, Alvaro G.; Thimmapuram, Jyothi; Ali, Shahjahan; Glaser, Fabian; Paz, Arnon; Avivi, Aaron; Band, Mark (PLoS ONE, Public Library of Science (PLoS), 2011-08-12) [Article]The blind subterranean mole rat (Spalax ehrenbergi superspecies) is a model animal for survival under extreme environments due to its ability to live in underground habitats under severe hypoxic stress and darkness. Here we report the transcriptome sequencing of Spalax galili, a chromosomal type of S. ehrenbergi. cDNA pools from muscle and brain tissues isolated from animals exposed to hypoxic and normoxic conditions were sequenced using Sanger, GS FLX, and GS FLX Titanium technologies. Assembly of the sequences yielded over 51,000 isotigs with homology to ~12,000 mouse, rat or human genes. Based on these results, it was possible to detect large numbers of splice variants, SNPs, and novel transcribed regions. In addition, multiple differential expression patterns were detected between tissues and treatments. The results presented here will serve as a valuable resource for future studies aimed at identifying genes and gene regions evolved during the adaptive radiation associated with underground life of the blind mole rat. 2011 Malik et al.
Identification and Analysis of Red Sea Mangrove (Avicennia marina) microRNAs by High-Throughput Sequencing and Their Association with Stress ResponsesKhraiwesh, Basel; Pugalenthi, Ganesan; Fedoroff, Nina V. (PLoS ONE, Public Library of Science (PLoS), 2013-04-08) [Article]Although RNA silencing has been studied primarily in model plants, advances in high-throughput sequencing technologies have enabled profiling of the small RNA components of many more plant species, providing insights into the ubiquity and conservatism of some miRNA-based regulatory mechanisms. Small RNAs of 20 to 24 nucleotides (nt) are important regulators of gene transcript levels by either transcriptional or by posttranscriptional gene silencing, contributing to genome maintenance and controlling a variety of developmental and physiological processes. Here, we used deep sequencing and molecular methods to create an inventory of the small RNAs in the mangrove species, Avicennia marina. We identified 26 novel mangrove miRNAs and 193 conserved miRNAs belonging to 36 families. We determined that 2 of the novel miRNAs were produced from known miRNA precursors and 4 were likely to be species-specific by the criterion that we found no homologs in other plant species. We used qRT-PCR to analyze the expression of miRNAs and their target genes in different tissue sets and some demonstrated tissue-specific expression. Furthermore, we predicted potential targets of these putative miRNAs based on a sequence homology and experimentally validated through endonucleolytic cleavage assays. Our results suggested that expression profiles of miRNAs and their predicted targets could be useful in exploring the significance of the conservation patterns of plants, particularly in response to abiotic stress. Because of their well-developed abilities in this regard, mangroves and other extremophiles are excellent models for such exploration. © 2013 Khraiwesh et al.