msRepDB: a comprehensive repetitive sequence database of over 80 000 species.
Type
ArticleKAUST Department
Computational Bioscience Research Center (CBRC)Computer Science Program
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
Structural and Functional Bioinformatics Group
KAUST Grant Number
FCC/1/1976-18-01FCC/1/1976-23-01
FCC/1/1976-25-01
FCC/1/1976-26-01
OSR
REI/1/0018-01-01
REI/1/4216-01-01
REI/1/4437-01-01
REI/1/4473-01-01
REI/1/4742-01
URF/1/4098-01-01
URF/1/4352-01-01
URF/1/4379-01-0
Date
2021-12-01Online Publication Date
2021-12-01Print Publication Date
2022-01-07Permanent link to this record
http://hdl.handle.net/10754/673931
Metadata
Show full item recordAbstract
Repeats are prevalent in the genomes of all bacteria, plants and animals, and they cover nearly half of the Human genome, which play indispensable roles in the evolution, inheritance, variation and genomic instability, and serve as substrates for chromosomal rearrangements that include disease-causing deletions, inversions, and translocations. Comprehensive identification, classification and annotation of repeats in genomes can provide accurate and targeted solutions towards understanding and diagnosis of complex diseases, optimization of plant properties and development of new drugs. RepBase and Dfam are two most frequently used repeat databases, but they are not sufficiently complete. Due to the lack of a comprehensive repeat database of multiple species, the current research in this field is far from being satisfactory. LongRepMarker is a new framework developed recently by our group for comprehensive identification of genomic repeats. We here propose msRepDB based on LongRepMarker, which is currently the most comprehensive multi-species repeat database, covering >80 000 species. Comprehensive evaluations show that msRepDB contains more species, and more complete repeats and families than RepBase and Dfam databases. (https://msrepdb.cbrc.kaust.edu.sa/pages/msRepDB/index.html).Citation
Liao, X., Hu, K., Salhi, A., Zou, Y., Wang, J., & Gao, X. (2021). msRepDB: a comprehensive repetitive sequence database of over 80 000 species. Nucleic Acids Research. doi:10.1093/nar/gkab1089Sponsors
This work was supported by the National Natural Science Foundation of China [62002388,61732009, 61772557, U1909208], King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [FCC/1/1976-18-01,FCC/1/1976-23-01, FCC/1/1976-25-01, FCC/1/1976-26-01, REI/1/0018-01-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4352-01-01,URF/1/4379-01-01, REI/1/4742-01-01, URF/1/4098-01-01], Hunan Provincial Natural Science Foundation of China [2021JJ40787], Hunan Provincial Science and Technology Program [2018wk4001] and 111 Project [B18059]. This work was carried out in part using computing resources at the High Performance Computing Center of Central South UniversityPublisher
Oxford University Press (OUP)Journal
Nucleic acids researchPubMed ID
34850956ae974a485f413a2113503eed53cd6c53
10.1093/nar/gkab1089
Scopus Count
Related articles
- The Dfam database of repetitive DNA families.
- Authors: Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, Smit AF, Wheeler TJ
- Issue date: 2016 Jan 4
- Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.
- Authors: Kohany O, Gentles AJ, Hankus L, Jurka J
- Issue date: 2006 Oct 25
- Repbase Update, a database of eukaryotic repetitive elements.
- Authors: Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J
- Issue date: 2005
- Dfam: a database of repetitive DNA based on profile hidden Markov models.
- Authors: Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AF, Finn RD
- Issue date: 2013 Jan
- A sensitive repeat identification framework based on short and long reads.
- Authors: Liao X, Li M, Hu K, Wu FX, Gao X, Wang J
- Issue date: 2021 Sep 27