The genomic variation landscape of globally-circulating clades of SARS-CoV-2 defines a genetic barcoding scheme
Type
PreprintAuthors
Guan, Qingtian
Sadykov, Mukhtar
Nugmanova, Raushan

Carr, Michael J
Arold, Stefan T.

Pain, Arnab

KAUST Department
Biological and Environmental Sciences and Engineering (BESE) DivisionBioscience
Bioscience Program
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Environmental Science and Engineering Program
King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal8 Jeddah, 23955-6900, Saudi Arabia.
Pathogen Genomics Laboratory
Structural Biology and Engineering
KAUST Grant Number
FCC/1/1976-25-01OCRF-2014-CRG3-2267
OSR-2015-CRG4-2610
Date
2020-04-23Permanent link to this record
http://hdl.handle.net/10754/662635
Metadata
Show full item recordAbstract
We describe fifteen major mutation events from 2,058 high-quality SARS-CoV-2 genomes deposited up to March 31st, 2020. These events define five major clades (G, I, S, D and V) of globally-circulating viral populations, representing 85.7% of all sequenced cases, which we can identify using a 10 nucleotide genetic classifier or barcode. We applied this barcode to 4,000 additional genomes deposited between March 31st and April 15th and classified successfully 95.6% of the clades demonstrating the utility of this approach. An analysis of amino acid variation in SARS-CoV-2 ORFs provided evidence of substitution events in the viral proteins involved in both host-entry and genome replication. The systematic monitoring of dynamic changes in the SARS-CoV-2 genomes of circulating virus populations over time can guide therapeutic and prophylactic strategies to manage and contain the virus and, also, with available efficacious antivirals and vaccines, aid in the monitoring of circulating genetic diversity as we proceed towards elimination of the agent. The barcode will add the necessary genetic resolution to facilitate tracking and monitoring of infection clusters to distinguish imported and indigenous cases and thereby aid public health measures seeking to interrupt transmission chains without the requirement for real-time complete genomes sequencing.Citation
Guan, Q., Sadykov, M., Nugmanova, R., Carr, M. J., Arold, S. T., & Pain, A. (2020). The genomic variation landscape of globally-circulating clades of SARS-CoV-2 defines a genetic barcoding scheme. doi:10.1101/2020.04.21.054221Sponsors
This work was supported by funding from King Abdullah University of Science and Technology (KAUST), Office of Sponsored Research (OSR), under award number FCC/1/1976-25-01. Work in AP’s laboratory is supported by the KAUST faculty baseline fund (BAS/1/1020-01- 01) and research grants from the Office for Sponsored Research (OSR-2015-CRG4-2610, OCRF-2014-CRG3-2267). We thank all laboratories which have contributed sequences to the GISAID database. We thank Olga Douvropoulou, Raeece Naeem Mohamed Ghazzali and Sharif Hala for their support during the work. We also thank Richard Culleton (Nagasaki University, Japan) and Gabo Gonzalez (UCD, Ireland) for their critical comments on the manuscript draft.Publisher
Cold Spring Harbor LaboratoryAdditional Links
http://biorxiv.org/lookup/doi/10.1101/2020.04.21.054221https://www.biorxiv.org/content/biorxiv/early/2020/04/23/2020.04.21.054221.full.pdf
ae974a485f413a2113503eed53cd6c53
10.1101/2020.04.21.054221