Show simple item record

dc.contributor.authorLiao, Xingyu
dc.contributor.authorLi, M
dc.contributor.authorHu, K
dc.contributor.authorWu, FX
dc.contributor.authorGao, Xin
dc.date.accessioned2021-07-11T08:45:30Z
dc.date.available2021-07-11T08:45:30Z
dc.date.issued2021-07-02
dc.date.submitted2020-03-07
dc.identifier.citationLiao, X., Li, M., Hu, K., Wu, F.-X., Gao, X., & Wang, J. (2021). A sensitive repeat identification framework based on short and long reads. Nucleic Acids Research. doi:10.1093/nar/gkab563
dc.identifier.issn0305-1048
dc.identifier.issn1362-4962
dc.identifier.pmid34214175
dc.identifier.doi10.1093/nar/gkab563
dc.identifier.doi10.1093/narlgkab563
dc.identifier.urihttp://hdl.handle.net/10754/670106
dc.description.abstractNumerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
dc.description.sponsorshipNational Natural Science Foundation of China [62002388, 61772557]; The NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization [U1909208]; Hunan Provincial Science and Technology Program [2018wk4001]; 111 Project [B18059]; King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [BAS/1/1624-01, FCC/1/1976-18-01, FCC/1/1976-23-01, FCC/1/1976-25-01, FCC/1/1976-26-01, REI/1/0018-01-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4352-01-01, REI/1/4742-01-01, URF/1/4098-01-01]. Funding for open access charge: The NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization [U1909208]; Hunan Provincial Science and Technology Program [2018wk4001].
dc.publisherOxford University Press
dc.relation.urlhttps://academic.oup.com/nar/advance-article-abstract/doi/10.1093/nar/gkab563/6313241
dc.rightsThis is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.titleA sensitive repeat identification framework based on short and long reads
dc.typeArticle
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.identifier.journalNucleic Acids Research
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionHunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
dc.contributor.institutionDepartment of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
kaust.personLiao, Xingyu
kaust.personGao, Xin
kaust.grant.numberBAS/1/1624-01
kaust.grant.numberFCC/1/1976-18-01
kaust.grant.numberFCC/1/1976-23-01
kaust.grant.numberFCC/1/1976-25-01
kaust.grant.numberFCC/1/1976-26-01
kaust.grant.numberREI/1/0018-01-01
kaust.grant.numberREI/1/4216-01-01
kaust.grant.numberREI/1/4437-01-01
kaust.grant.numberREI/1/4473-01-01
kaust.grant.numberURF/1/4098-01-01
kaust.grant.numberURF/1/4352-01-01
dc.date.accepted2021-06-18
refterms.dateFOA2021-07-11T08:49:00Z
kaust.acknowledged.supportUnitBAS
kaust.acknowledged.supportUnitNSFC
kaust.acknowledged.supportUnitOffice of Sponsored Research (OSR)
dc.date.published-online2021-07-02
dc.date.published-print2021-09-27


Files in this item

Thumbnail
Name:
gkab563.pdf
Size:
2.726Mb
Format:
PDF
Description:
Publisher's version

This item appears in the following Collection(s)

Show simple item record

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's license is described as This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.