• Login
    View Item 
    •   Home
    • Research
    • Articles
    • View Item
    •   Home
    • Research
    • Articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguideTheses and Dissertations LibguideSubmit an Item

    Statistics

    Display statistics

    Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Article-PLoS_ONE-Read_lengt-2010.pdf
    Size:
    694.6Kb
    Format:
    PDF
    Description:
    Article - Full Text
    Download
    Thumbnail
    Name:
    Supplement_1_-_PLoS_ONE-Read_lengt-2010.pone.0011518.s001.pdf
    Size:
    28.95Kb
    Format:
    PDF
    Description:
    Supplemental File 1
    Download
    Thumbnail
    Name:
    Supplement_2_-_PLoS_ONE-Read_lengt-2010.pone.0011518.s002.tif
    Size:
    693.3Kb
    Format:
    TIFF image
    Description:
    Supplemental File 2
    Image viewer
    Download
    Thumbnail
    Name:
    Supplement_3_-_PLoS_ONE-Read_lengt-2010.pone.0011518.s004.tif
    Size:
    1.014Mb
    Format:
    TIFF image
    Description:
    Supplemental File 3
    Image viewer
    Download
    Thumbnail
    Name:
    Supplement_4_-_PLoS_ONE-Read_lengt-2010.pone.0011518.s003.xls
    Size:
    278.5Kb
    Format:
    Microsoft Excel
    Description:
    Supplemental File 4
    Download
    View more filesView fewer files
    Type
    Article
    Authors
    Cahill, Matt J.
    Köser, Claudio U.
    Ross, Nicholas E.
    Archer, John A.C. cc
    KAUST Department
    Biological and Environmental Sciences and Engineering (BESE) Division
    Computational Bioscience Research Center (CBRC)
    Date
    2010-07-12
    Permanent link to this record
    http://hdl.handle.net/10754/325284
    
    Metadata
    Show full item record
    Abstract
    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.
    Citation
    Cahill MJ, Köser CU, Ross NE, Archer JAC (2010) Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies. PLoS ONE 5: e11518. doi:10.1371/journal.pone.0011518.
    Publisher
    Public Library of Science (PLoS)
    Journal
    PLoS ONE
    DOI
    10.1371/journal.pone.0011518
    PubMed ID
    20634954
    PubMed Central ID
    PMC2902515
    ae974a485f413a2113503eed53cd6c53
    10.1371/journal.pone.0011518
    Scopus Count
    Collections
    Articles; Biological and Environmental Science and Engineering (BESE) Division; Computational Bioscience Research Center (CBRC)

    entitlement

    Related articles

    • SeqEntropy: genome-wide assessment of repeats for short read sequencing.
    • Authors: Chu HT, Hsiao WW, Tsao TT, Hsu DF, Chen CC, Lee SA, Kao CY
    • Issue date: 2013
    • Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.
    • Authors: Wetzel J, Kingsford C, Pop M
    • Issue date: 2011 Apr 13
    • Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler.
    • Authors: Zerbino DR, McEwen GK, Margulies EH, Birney E
    • Issue date: 2009 Dec 22
    • Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.
    • Authors: Desai A, Marwah VS, Yadav A, Jha V, Dhaygude K, Bangar U, Kulkarni V, Jere A
    • Issue date: 2013
    • SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome.
    • Authors: Stadermann KB, Weisshaar B, Holtgräwe D
    • Issue date: 2015 Sep 16

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      Convolutional Sequence to Sequence Learning to Improve Nanopore Basecalling Efficiency

      Alqatari, Ammar (2018-08-28) [Poster]
    • Thumbnail

      Data for : Poly(A) Dataset for PAS sequences and pseudo-PAS sequences Classification (fasta format)

      Albalawi, Fahad; Chahid, Abderrazak; Guo, Xingang; Albaradei, Somayah; Magana-Mora, Arturo; Jankovic, Boris R.; Uludag, Mahmut; Van Neste, Christophe; Essack, Magbubah; Laleg-Kirati, Taous-Meriem; Bajic, Vladimir B. (KAUST Research Repository, 2018-11-15) [Dataset]
      This Dataset contains DNA sequences of the human genome hg38 from GENCODE folder at EBI ftp server (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/GRCh38.primary_assembly.genome.fa.gz) A-Positive set (PAS sequences) Using GENCODE annotation for poly(A) (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.polyAs.gff3.gz) We selected poly(A) signal annotation. Using bedtools-slop option, we found regions extended 300 bp upstream and 300 bp downstream of the poly(A) hexamer. With the bedtools-getfasta option, we extracted 606 bp fasta sequences from these regions. After eliminating duplicates, we obtained 37’516 presumed true functional poly(A) signal (PAS) sequences. Sequences from this set will be denoted as positive. B- Negative set (pseudo-PAS sequences) For the negative set, we looked for regions extended outside the region covering 1’000 bp upstream and downstream of the positive poly(A) hexamer signal using bedtools-complement. Homer tool was used to find matches for the 12 most frequent human poly(A) variants. Since the number of matches was huge, sampling was used to select 37’516 pseudo-PAS sequences. Sampling was done from each chromosome proportionally to the lengths of the chromosomes and also to the expected frequency of the poly(A) variants. Out of these predictions, for each PAS hexamer, we selected the same number of pseudo-PAS sequences as in the positive set. Training and testing sets We selected randomly from each of the positive and negative datasets 20% of sequences for the independent test data. The testing set thus consisted of 15’020 sequences. The remaining data represented the training set that consisted of 60’012 sequences. Both datasets are balanced relative to the true PAS and pseudo-PAS sequences.
    • Thumbnail

      Next-Generation Sequencing at High Sequencing Depth as a Tool to Study the Evolution of Metastasis Driven by Genetic Change Events of Lung Squamous Cell Carcinoma

      Mansour, Hicham; Ouhajjou, Abdelhak; Bajic, Vladimir B.; Incitti, Roberto (Frontiers in Oncology, Frontiers Media SA, 2020-08-05) [Article]
      Background: The aim of this study is to report tumoral genetic mutations observed at high sequencing depth in a lung squamous cell carcinoma (SqCC) sample. We describe the findings and differences in genetic mutations that were studied by deep next-generation sequencing methods on the primary tumor and liver metastasis samples. In this report, we also discuss how these differences may be involved in determining the tumor progression leading to the metastasis stage. Methods: We followed one lung SqCC patient who underwent FDG-PET scan imaging, before and after three months of treatment. We sequenced 26 well-known cancer-related genes, at an average of ~6,000 × sequencing coverage, in two spatially distinct regions, one from a primary lung tumor metastasis and the other from a distal liver metastasis, which was present before the treatment. Results: A total of 3,922,196 read pairs were obtained across all two samples' sequenced locations. Merged mapped reads showed several variants, from which we selected 36 with high confidence call. While we found 83% of genetic concordance between the distal metastasis and primary tumor, six variants presented substantial discordance. In the liver metastasis sample, we observed three de novo genetic changes, two on the FGFR3 gene and one on the CDKN2A gene, and the frequency of one variant found on the FGFR2 gene has been increased. Two genetic variants in the HRAS gene, which were present initially in the primary tumor, have been completely lost in the liver tumor. The discordant variants have coding consequences as follows: FGFR3 (c.746C>G, p. Ser249Cys), CDKN2A (c.47_50delTGGC, p. Leu16Profs*9), and HRAS (c.182A>C, p. Gln61Pro). The pathogenicity prediction scores for the acquired variants, assessed using several databases, reported these variants as pathogenic, with a gain of function for FGFR3 and a loss of function for CDKN2A. The patient follow-up using imaging with 18F-FDG PET/CT before and after four cycles of treatment shows discordant tumor progression in metastatic liver compared to primary lung tumor. Conclusions: Our results report the occurrence of several genetic changes between primary tumor and distant liver metastasis in lung SqCC, among which non-silent mutations may be associated with tumor evolution during metastasis.
    DSpace software copyright © 2002-2023  DuraSpace
    Quick Guide | Contact Us | KAUST University Library
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.