Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data

Handle URI:
http://hdl.handle.net/10754/567063
Title:
Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data
Authors:
Allam, Amin ( 0000-0001-5137-0990 ) ; Kalnis, Panos ( 0000-0002-5060-1360 ) ; Solovyev, Victor ( 0000-0001-8885-493X )
Abstract:
Motivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.
KAUST Department:
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Citation:
Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data 2015:btv415 Bioinformatics
Publisher:
Oxford University Press (OUP)
Journal:
Bioinformatics
Issue Date:
14-Jul-2015
DOI:
10.1093/bioinformatics/btv415
Type:
Article
ISSN:
1367-4803; 1460-2059
Additional Links:
http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv415
Appears in Collections:
Articles; Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division

Full metadata record

DC FieldValue Language
dc.contributor.authorAllam, Aminen
dc.contributor.authorKalnis, Panosen
dc.contributor.authorSolovyev, Victoren
dc.date.accessioned2015-08-17T08:21:19Zen
dc.date.available2015-08-17T08:21:19Zen
dc.date.issued2015-07-14en
dc.identifier.citationKarect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data 2015:btv415 Bioinformaticsen
dc.identifier.issn1367-4803en
dc.identifier.issn1460-2059en
dc.identifier.doi10.1093/bioinformatics/btv415en
dc.identifier.urihttp://hdl.handle.net/10754/567063en
dc.description.abstractMotivation: Next-generation sequencing generates large amounts of data affected by errors in the form of substitutions, insertions or deletions of bases. Error correction based on the high-coverage information, typically improves de novo assembly. Most existing tools can correct substitution errors only; some support insertions and deletions, but accuracy in many cases is low. Results: We present Karect, a novel error correction technique based on multiple alignment. Our approach supports substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome. Experiments with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate that Karect is more accurate than previous methods, both in terms of correcting individual-bases errors (up to 10% increase in accuracy gain) and post de novo assembly quality (up to 10% increase in NGA50). We also introduce an improved framework for evaluating the quality of error correction.en
dc.language.isoenen
dc.publisherOxford University Press (OUP)en
dc.relation.urlhttp://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv415en
dc.rightsThis is a pre-copyedited, author-produced PDF of an article accepted for publication in Bioinformatics following peer review. The version of record Allam, Amin, Panos Kalnis, and Victor Solovyev. "Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data." Bioinformatics (2015): btv415. is available online at: http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv415.en
dc.titleKarect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing dataen
dc.typeArticleen
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Divisionen
dc.identifier.journalBioinformaticsen
dc.eprint.versionPost-printen
dc.contributor.affiliationKing Abdullah University of Science and Technology (KAUST)en
kaust.authorAllam, Aminen
kaust.authorKalnis, Panosen
kaust.authorSolovyev, Victoren
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.