An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing

Handle URI:
http://hdl.handle.net/10754/626744
Title:
An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing
Authors:
Han, Renmin; Li, Yu; Wang, Sheng; Gao, Xin ( 0000-0002-7108-3574 )
Abstract:
Long-reads, point-of-care, and PCR-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the global mapping between the raw electrical current signal sequence and the expected signal sequence from the pore model serves as the key building block to base calling, reads mapping, variant identification, and methylation detection. However, the ultra-long reads of nanopore sequencing and an order of magnitude difference in the sampling speeds of the two sequences make the classical dynamic time warping (DTW) and its variants infeasible to solve the problem. Here, we propose a novel multi-level DTW algorithm, cwDTW, based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa, as well as two benchmark datasets from previous studies, demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can almost always generate warping paths that are very close to the original DTW, which are remarkably more accurate than the state-of-the-art methods including FastDTW and PrunedDTW. Meanwhile, on the real nanopore datasets, cwDTW is about 440 times faster than FastDTW and 3000 times faster than the original DTW. Our program is available at https://github.com/realbigws/cwDTW.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division; Computer Science Program; Computational Bioscience Research Center (CBRC)
Citation:
Han R, Li Y, Wang S, Gao X (2017) An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing. Available: http://dx.doi.org/10.1101/238857.
Publisher:
Cold Spring Harbor Laboratory
Issue Date:
24-Dec-2017
DOI:
10.1101/238857
Type:
Preprint
Sponsors:
We thank Minh Duc Cao, Lachlan J.M. Coin, Louise Roddam, and Tania Duarte for providing the nanopore sequencing data for the Pandoraea pnomenusa sample.
Additional Links:
https://www.biorxiv.org/content/early/2017/12/23/238857.1
Appears in Collections:
Other/General Submission; Computer Science Program; Computational Bioscience Research Center (CBRC); Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorHan, Renminen
dc.contributor.authorLi, Yuen
dc.contributor.authorWang, Shengen
dc.contributor.authorGao, Xinen
dc.date.accessioned2018-01-15T06:10:39Z-
dc.date.available2018-01-15T06:10:39Z-
dc.date.issued2017-12-24en
dc.identifier.citationHan R, Li Y, Wang S, Gao X (2017) An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing. Available: http://dx.doi.org/10.1101/238857.en
dc.identifier.doi10.1101/238857en
dc.identifier.urihttp://hdl.handle.net/10754/626744-
dc.description.abstractLong-reads, point-of-care, and PCR-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the global mapping between the raw electrical current signal sequence and the expected signal sequence from the pore model serves as the key building block to base calling, reads mapping, variant identification, and methylation detection. However, the ultra-long reads of nanopore sequencing and an order of magnitude difference in the sampling speeds of the two sequences make the classical dynamic time warping (DTW) and its variants infeasible to solve the problem. Here, we propose a novel multi-level DTW algorithm, cwDTW, based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa, as well as two benchmark datasets from previous studies, demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can almost always generate warping paths that are very close to the original DTW, which are remarkably more accurate than the state-of-the-art methods including FastDTW and PrunedDTW. Meanwhile, on the real nanopore datasets, cwDTW is about 440 times faster than FastDTW and 3000 times faster than the original DTW. Our program is available at https://github.com/realbigws/cwDTW.en
dc.description.sponsorshipWe thank Minh Duc Cao, Lachlan J.M. Coin, Louise Roddam, and Tania Duarte for providing the nanopore sequencing data for the Pandoraea pnomenusa sample.en
dc.publisherCold Spring Harbor Laboratoryen
dc.relation.urlhttps://www.biorxiv.org/content/early/2017/12/23/238857.1en
dc.rightsThe copyright holder for this preprint is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.en
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.titleAn accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencingen
dc.typePreprinten
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.contributor.departmentComputer Science Programen
dc.contributor.departmentComputational Bioscience Research Center (CBRC)en
dc.eprint.versionPre-printen
kaust.authorHan, Renminen
kaust.authorLi, Yuen
kaust.authorWang, Shengen
kaust.authorGao, Xinen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.