Show simple item record

dc.contributor.authorLiao, Xingyu
dc.contributor.authorGao, Xin
dc.contributor.authorZhang, Xiankai
dc.contributor.authorWu, Fang-Xiang
dc.contributor.authorWang, Jianxin
dc.date.accessioned2020-10-22T12:29:07Z
dc.date.available2020-10-22T12:29:07Z
dc.date.issued2020-10-19
dc.date.submitted2019-12-10
dc.identifier.citationLiao, X., Gao, X., Zhang, X., Wu, F.-X., & Wang, J. (2020). RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads. BMC Bioinformatics, 21(1). doi:10.1186/s12859-020-03779-w
dc.identifier.issn1471-2105
dc.identifier.pmid33076827
dc.identifier.doi10.1186/s12859-020-03779-w
dc.identifier.urihttp://hdl.handle.net/10754/665651
dc.description.abstractBACKGROUND:Repetitive sequences account for a large proportion of eukaryotes genomes. Identification of repetitive sequences plays a significant role in many applications, such as structural variation detection and genome assembly. Many existing de novo repeat identification pipelines or tools make use of assembly of the high-frequency k-mers to obtain repeats. However, a certain degree of sequence coverage is required for assemblers to get the desired assemblies. On the other hand, assemblers cut the reads into shorter k-mers for assembly, which may destroy the structure of the repetitive regions. For the above reasons, it is difficult to obtain complete and accurate repetitive regions in the genome by using existing tools. RESULTS:In this study, we present a new method called RepAHR for de novo repeat identification by assembly of the high-frequency reads. Firstly, RepAHR scans next-generation sequencing (NGS) reads to find the high-frequency k-mers. Secondly, RepAHR filters the high-frequency reads from whole NGS reads according to certain rules based on the high-frequency k-mer. Finally, the high-frequency reads are assembled to generate repeats by using SPAdes, which is considered as an outstanding genome assembler with NGS sequences. CONLUSIONS:We test RepAHR on five data sets, and the experimental results show that RepAHR outperforms RepARK and REPdenovo for detecting repeats in terms of N50, reference alignment ratio, coverage ratio of reference, mask ratio of Repbase and some other metrics.
dc.description.sponsorshipThe authors would like to thank the editor and anonymous reviewers for their valuable comments in improving the manuscript. Thanks the National Natural Science Foundation of China, Hunan Provincial Science and technology Program, 111 Project, and King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) for their support to this study.
dc.publisherSpringer Nature
dc.relation.urlhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03779-w
dc.rightsThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.titleRepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.
dc.typeArticle
dc.contributor.departmentComputational Bioscience Research Center (CBRC)
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.contributor.departmentStructural and Functional Bioinformatics Group
dc.identifier.journalBMC bioinformatics
dc.eprint.versionPublisher's Version/PDF
dc.contributor.institutionSchool of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083, China.
dc.contributor.institutionBiomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada.
dc.identifier.volume21
dc.identifier.issue1
kaust.personGao, Xin
dc.date.accepted2020-09-24
refterms.dateFOA2020-10-22T12:30:38Z
kaust.acknowledged.supportUnitOffice of Sponsored Research (OSR)
dc.date.published-online2020-10-19
dc.date.published-print2020-12


Files in this item

Thumbnail
Name:
RepAHR.pdf
Size:
2.389Mb
Format:
PDF
Description:
published version

This item appears in the following Collection(s)

Show simple item record

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Except where otherwise noted, this item's license is described as This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.