NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks
KAUST DepartmentComputer Science Program
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Embargo End Date2022-05-11
Permanent link to this recordhttp://hdl.handle.net/10754/669519
MetadataShow full item record
AbstractMotivation Oxford Nanopore sequencing producing long reads at low cost has made many breakthroughs in genomics studies. However, the large number of errors in Nanopore genome assembly affect the accuracy of genome analysis. Polishing is a procedure to correct the errors in genome assembly and can improve the reliability of the downstream analysis. However, the performances of the existing polishing methods are still not satisfactory. Results We developed a novel polishing method, NeuralPolish, to correct the errors in assemblies based on alignment matrix construction and orthogonal Bi-GRU networks. In this method, we designed an alignment feature matrix for representing read-to-assembly alignment. Each row of the matrix represents a read, and each column represents the aligned bases at each position of the contig. In the network architecture, a bi-directional GRU network is used to extract the sequence information inside each read by processing the alignment matrix row by row. After that, the feature matrix is processed by another bi-directional GRU network column by column to calculate the probability distribution. Finally, a CTC decoder generates a polished sequence with a greedy algorithm. We used five real data sets and three assembly tools including Wtdbg2, Flye and Canu for testing, and compared the results of different polishing methods including NeuralPolish, Racon, MarginPolish, HELEN and Medaka. Comprehensive experiments demonstrate that NeuralPolish achieves more accurate assembly with fewer errors than other polishing methods and can improve the accuracy of assembly obtained by different assemblers.
CitationHuang, N., Nie, F., Ni, P., Luo, F., Gao, X., & Wang, J. (2021). NeuralPolish: a novel Nanopore polishing method based on alignment matrix construction and orthogonal Bi-GRU Networks. Bioinformatics. doi:10.1093/bioinformatics/btab354
SponsorsThis work was supported in part by the National Natural Science Foundation of China under Grants (Nos.U1909208 and 61772557), 111 Project (No. B18059), Hunan Provincial Science and Technology Program (No. 2018wk4001) to J.W., the U. S. National Institute of Food and Agriculture (NIFA) under grant 2017-70016-26051 and the U.S.National Science Foundation (NSF) under grants ABI-1759856 to F.L, and the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. FCC/1/1976-26-01, URF/1/3412-01-01, URF/1/4098-01-01, and REI/1/4473-01-01 to X.G.
PublisherOxford University Press (OUP)
- Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.
- Authors: Chen Z, Erickson DL, Meng J
- Issue date: 2021 May
- Benchmarking Long-Read Assemblers for Genomic Analyses of Bacterial Pathogens Using Oxford Nanopore Sequencing.
- Authors: Chen Z, Erickson DL, Meng J
- Issue date: 2020 Dec 1
- Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.
- Authors: Firtina C, Kim JS, Alser M, Senol Cali D, Cicek AE, Alkan C, Mutlu O
- Issue date: 2020 Jun 1
- Benchmarking of long-read assemblers for prokaryote whole genome sequencing.
- Authors: Wick RR, Holt KE
- Issue date: 2019
- An exploration of assembly strategies and quality metrics on the accuracy of the rewarewa (Knightia excelsa) genome.
- Authors: McCartney AM, Hilario E, Choi SS, Guhlin J, Prebble JM, Houliston G, Buckley TR, Chagné D
- Issue date: 2021 Aug