The genomic variation landscape of globally-circulating clades of SARS-CoV-2 defines a genetic barcoding scheme
Carr, Michael J
Arold, Stefan T.
KAUST DepartmentBiological and Environmental Sciences and Engineering (BESE) Division
Computational Bioscience Research Center (CBRC)
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
Environmental Science and Engineering Program
King Abdullah University of Science and Technology (KAUST), Pathogen Genomics Laboratory, Biological and Environmental Science and Engineering (BESE), Thuwal8 Jeddah, 23955-6900, Saudi Arabia.
Pathogen Genomics Laboratory
Structural Biology and Engineering
Permanent link to this recordhttp://hdl.handle.net/10754/662635
MetadataShow full item record
AbstractWe describe fifteen major mutation events from 2,058 high-quality SARS-CoV-2 genomes deposited up to March 31st, 2020. These events define five major clades (G, I, S, D and V) of globally-circulating viral populations, representing 85.7% of all sequenced cases, which we can identify using a 10 nucleotide genetic classifier or barcode. We applied this barcode to 4,000 additional genomes deposited between March 31st and April 15th and classified successfully 95.6% of the clades demonstrating the utility of this approach. An analysis of amino acid variation in SARS-CoV-2 ORFs provided evidence of substitution events in the viral proteins involved in both host-entry and genome replication. The systematic monitoring of dynamic changes in the SARS-CoV-2 genomes of circulating virus populations over time can guide therapeutic and prophylactic strategies to manage and contain the virus and, also, with available efficacious antivirals and vaccines, aid in the monitoring of circulating genetic diversity as we proceed towards elimination of the agent. The barcode will add the necessary genetic resolution to facilitate tracking and monitoring of infection clusters to distinguish imported and indigenous cases and thereby aid public health measures seeking to interrupt transmission chains without the requirement for real-time complete genomes sequencing.
CitationGuan, Q., Sadykov, M., Nugmanova, R., Carr, M. J., Arold, S. T., & Pain, A. (2020). The genomic variation landscape of globally-circulating clades of SARS-CoV-2 defines a genetic barcoding scheme. doi:10.1101/2020.04.21.054221
SponsorsThis work was supported by funding from King Abdullah University of Science and Technology (KAUST), Office of Sponsored Research (OSR), under award number FCC/1/1976-25-01. Work in AP’s laboratory is supported by the KAUST faculty baseline fund (BAS/1/1020-01- 01) and research grants from the Office for Sponsored Research (OSR-2015-CRG4-2610, OCRF-2014-CRG3-2267). We thank all laboratories which have contributed sequences to the GISAID database. We thank Olga Douvropoulou, Raeece Naeem Mohamed Ghazzali and Sharif Hala for their support during the work. We also thank Richard Culleton (Nagasaki University, Japan) and Gabo Gonzalez (UCD, Ireland) for their critical comments on the manuscript draft.
PublisherCold Spring Harbor Laboratory