Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

Handle URI:
http://hdl.handle.net/10754/334594
Title:
Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures
Authors:
Kleftogiannis, Dimitrios A. ( 0000-0003-1086-821X ) ; Kalnis, Panos ( 0000-0002-5060-1360 ) ; Bajic, Vladimir B. ( 0000-0001-5435-4750 )
Abstract:
A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computational Bioscience Research Center (CBRC)
Citation:
Kleftogiannis D, Kalnis P, Bajic VB (2013) Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures. PLoS ONE 8: e75505. doi:10.1371/journal.pone.0075505.
Publisher:
Public Library of Science (PLoS)
Journal:
PLoS ONE
Issue Date:
27-Sep-2013
DOI:
10.1371/journal.pone.0075505
PubMed ID:
24086547
PubMed Central ID:
PMC3785575
Type:
Article
ISSN:
1932-6203
Appears in Collections:
Articles; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorKleftogiannis, Dimitrios A.en
dc.contributor.authorKalnis, Panosen
dc.contributor.authorBajic, Vladimir B.en
dc.date.accessioned2014-11-11T14:31:17Z-
dc.date.available2014-11-11T14:31:17Z-
dc.date.issued2013-09-27en
dc.identifier.citationKleftogiannis D, Kalnis P, Bajic VB (2013) Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures. PLoS ONE 8: e75505. doi:10.1371/journal.pone.0075505.en
dc.identifier.issn1932-6203en
dc.identifier.pmid24086547en
dc.identifier.doi10.1371/journal.pone.0075505en
dc.identifier.urihttp://hdl.handle.net/10754/334594en
dc.description.abstractA fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.en
dc.language.isoenen
dc.publisherPublic Library of Science (PLoS)en
dc.rightsThis is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en
dc.titleComparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructuresen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.identifier.journalPLoS ONEen
dc.identifier.pmcidPMC3785575en
dc.eprint.versionPublisher's Version/PDFen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorKleftogiannis, Dimitrios A.en

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.