Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
KAUST DepartmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Computational Bioscience Research Center (CBRC)
MetadataShow full item record
AbstractA fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
CitationKleftogiannis D, Kalnis P, Bajic VB (2013) Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures. PLoS ONE 8: e75505. doi:10.1371/journal.pone.0075505.
PublisherPublic Library of Science (PLoS)
PubMed Central IDPMC3785575
- Parallelized short read assembly of large genomes using de Bruijn graphs.
- Authors: Liu Y, Schmidt B, Maskell DL
- Issue date: 2011 Aug 25
- Assembler for de novo assembly of large genomes.
- Authors: Chu TC, Lu CH, Liu T, Lee GC, Li WH, Shih AC
- Issue date: 2013 Sep 3
- Next-generation sequencing and large genome assemblies.
- Authors: Henson J, Tischler G, Ning Z
- Issue date: 2012 Jun
- SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.
- Authors: Mariano DC, Pereira FL, Aguiar EL, Oliveira LC, Benevides L, Guimarães LC, Folador EL, Sousa TJ, Ghosh P, Barh D, Figueiredo HC, Silva A, Ramos RT, Azevedo VA
- Issue date: 2016 Dec 15
- Subset selection of high-depth next generation sequencing reads for de novo genome assembly using MapReduce framework.
- Authors: Fang CH, Chang YJ, Chung WC, Hsieh PH, Lin CY, Ho JM
- Issue date: 2015