Show simple item record

dc.contributor.authorHan, Renmin
dc.contributor.authorWang, Sheng
dc.contributor.authorGao, Xin
dc.date.accessioned2018-12-16T11:19:38Z
dc.date.available2018-12-16T11:19:38Z
dc.date.issued2018-12-10
dc.identifier.citationHan R, Wang S, Gao X (2018) Searching and mapping genomic subsequences in nanopore raw signals through novel dynamic time warping algorithms. Available: http://dx.doi.org/10.1101/491456.
dc.identifier.doi10.1101/491456
dc.identifier.urihttp://hdl.handle.net/10754/630282
dc.description.abstractNanopore sequencing is a promising technology to generate ultra-long reads based on the direct measurement of electrical current signals when a DNA molecule passes through a nanopore. These ultra-long reads are critical for detecting large structural variations in the genome. However, it is challenging to use nanopore sequencing to identify single nucleotide polymorphisms (SNPs) or other modifications such as methylations, especially at a low sequencing coverage, due to the high error rate in the base-called reads. It is possible to correct the base-calling error through the subsequence search by mapping a SNP-containing genomic region to the long nanopore raw signal sequences that contain this region and taking consensus of these signals. Nevertheless, the ultra-long raw signals and an order of magnitude difference in the sampling speed between the two sequences make the traditional algorithms infeasible to solve the problem. Here we propose two novel algorithms, the direct subsequence dynamic time warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet subsequence dynamic time warping for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence searching and exact mapping in nanopore raw signals. The proposed algorithms are based on the idea of subsequence-extended dynamic time warping and directly operate on the raw signals, without any loss of information. DSDTWnano could ensure an output of highly accurate query results and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that are based on continuous wavelet transform. Furthermore, a novel error function is proposed to specify the mapping accuracy between a genomic sequence and an electrical current signal sequence, which may serve as the standard criterion for further genome-to-signal mapping studies. Comprehensive experiments on three real-world nanopore datasets (human and lambda phage) demonstrate the efficiency and effectiveness of the proposed algorithms. Finally, we show the power of our algorithms in SNP detection under a low coverage (20x) on E. coli, with >95% detection rate. Our program is available at https://github.com/icthrm/cwSDTWnano.git.
dc.description.sponsorshipThe authors thank Minh Duc Cao, Lachlan J.M. Coin, Louise Roddam and Tania Duarte for providing the nanopore sequencing data. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards No. FCC/1/1976-04, URF/1/2601-01, URF/1/3007-01, URF/1/3412-01, and URF/1/3450-01.
dc.publisherCold Spring Harbor Laboratory
dc.relation.urlhttps://www.biorxiv.org/content/early/2018/12/10/491456
dc.rightsArchived with thanks to bioRxiv
dc.titleSearching and mapping genomic subsequences in nanopore raw signals through novel dynamic time warping algorithms
dc.typePreprint
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.eprint.versionPre-print
kaust.personHan, Renmin
kaust.personWang, Sheng
kaust.personGao, Xin
kaust.grant.numberFCC/1/1976-04
kaust.grant.numberURF/1/2601-01
kaust.grant.numberURF/1/3007-01
kaust.grant.numberURF/1/3412-01
kaust.grant.numberURF/1/3450-01
refterms.dateFOA2018-12-16T11:53:06Z


Files in this item

Thumbnail
Name:
491456.full.pdf
Size:
1.812Mb
Format:
PDF
Description:
Preprint

This item appears in the following Collection(s)

Show simple item record