DeepSimulator: a deep simulator for Nanopore sequencing

Handle URI:
http://hdl.handle.net/10754/626751
Title:
DeepSimulator: a deep simulator for Nanopore sequencing
Authors:
Li, Yu; Han, Renmin; Bi, Chongwei; Li, Mo ( 0000-0003-0827-8907 ) ; Wang, Sheng; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Motivation: Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals. Results: Here we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection. Availability: The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Biological and Environmental Sciences and Engineering (BESE) Division; Bioscience Program; Computational Bioscience Research Center (CBRC)
Citation:
Li Y, Han R, Bi C, Li M, Wang S, et al. (2017) DeepSimulator: a deep simulator for Nanopore sequencing. Available: http://dx.doi.org/10.1101/238683.
Publisher:
Cold Spring Harbor Laboratory
KAUST Grant Number:
URF/1/1976-04; URF/1/2602-01; URF/1/3007-01
Issue Date:
23-Dec-2017
DOI:
10.1101/238683
Type:
Preprint
Sponsors:
We thank Minh Duc Cao, Lachlan J.M. Coin, Louise Roddam, and Tania Duarte for providing the nanopore sequencing data for the lambda phage, E. coli, and Pandoraea pnomenusa samples. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards No. URF/1/1976-04, URF/1/2602-01, and URF/1/3007-01.
Additional Links:
https://www.biorxiv.org/content/early/2018/01/03/238683
Appears in Collections:
Other/General Submission; Bioscience Program; Computer Science Program; Computational Bioscience Research Center (CBRC); Biological and Environmental Sciences and Engineering (BESE) Division; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorLi, Yuen
dc.contributor.authorHan, Renminen
dc.contributor.authorBi, Chongweien
dc.contributor.authorLi, Moen
dc.contributor.authorWang, Shengen
dc.contributor.authorGao, Xinen
dc.date.accessioned2018-01-15T06:10:40Z-
dc.date.available2018-01-15T06:10:40Z-
dc.date.issued2017-12-23en
dc.identifier.citationLi Y, Han R, Bi C, Li M, Wang S, et al. (2017) DeepSimulator: a deep simulator for Nanopore sequencing. Available: http://dx.doi.org/10.1101/238683.en
dc.identifier.doi10.1101/238683en
dc.identifier.urihttp://hdl.handle.net/10754/626751-
dc.description.abstractMotivation: Oxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals. Results: Here we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection. Availability: The software can be accessed freely at: https://github.com/lykaust15/DeepSimulator.en
dc.description.sponsorshipWe thank Minh Duc Cao, Lachlan J.M. Coin, Louise Roddam, and Tania Duarte for providing the nanopore sequencing data for the lambda phage, E. coli, and Pandoraea pnomenusa samples. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards No. URF/1/1976-04, URF/1/2602-01, and URF/1/3007-01.en
dc.publisherCold Spring Harbor Laboratoryen
dc.relation.urlhttps://www.biorxiv.org/content/early/2018/01/03/238683en
dc.rightsArchived with thanks to BioRxiven
dc.titleDeepSimulator: a deep simulator for Nanopore sequencingen
dc.typePreprinten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentBiological and Environmental Sciences and Engineering (BESE) Divisionen
dc.contributor.departmentBioscience Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.eprint.versionPre-printen
kaust.authorLi, Yuen
kaust.authorHan, Renminen
kaust.authorBi, Chongweien
kaust.authorLi, Moen
kaust.authorWang, Shengen
kaust.authorGao, Xinen
kaust.grant.numberURF/1/1976-04en
kaust.grant.numberURF/1/2602-01en
kaust.grant.numberURF/1/3007-01en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.