KAUST Supercomputing Laboratory (KSL)

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 5 of 74
  • Article

    A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset

    (Springer Science and Business Media LLC, 2024-01-25) Zhou, Yong; Kathiresan, Nagarajan; Yu, Zhichao; Rivera, Luis F.; Yang, Yujian; Thimma, Manjula; Manickam, Keerthana; Chebotarov, Dmytro; Mauleon, Ramil; Chougule, Kapeel; Wei, Sharon; Gao, Tingting; Green, Carl Douglas; Zuccolo, Andrea; Xie, Weibo; Ware, Doreen; Zhang, Jianwei; McNally, Kenneth L.; Wing, Rod A.; Center for Desert Agriculture (CDA), Biological and Environmental Sciences & Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia; Information Technology Department, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia; Biological and Environmental Science and Engineering (BESE) Division; Bioscience Program; Information Security; Information Technology; Center for Desert Agriculture; KAUST Supercomputing Laboratory (KSL); Arizona Genomics Institute (AGI), School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, USA; National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, 430070, China; International Rice Research Institute (IRRI), Los Baños, Laguna, 4031, Philippines; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA; Crop Science Research Center (CSRC), Scuola Superiore Sant’Anna, Pisa, 56127, Italy

    Background Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable.

                Results
                Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a “subpopulation aware” 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq).
              
                Conclusions
                This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment.
    
  • Software

    Uauy-Lab/monococcum_introgressions: Analysis of monococcum introgressions into hexaploid wheat

    (Github, 2022-08-10) Ahmed, Hanin; Heuberger, Matthias; Schoen, Adam; Koo, Dal-Hoe; Quiroz-Chávez, Jesús; Adhikari, Laxman; Raupp, John; Cauet, Stéphane; Rodde, Nathalie; Cravero, Charlotte; Callot, Caroline; Lazo, Gerard R.; Kathiresan, Nagarajan; Sharma, Parva K.; Moot, Ian; Yadav, Inderjit Singh; Singh, Lovepreet; Saripalli, Gautam; Rawat, Nidhi; Datla, Raju; Athiyannan, Naveenkumar; Ramirez-Gonzalez, Ricardo H.; Uauy, Cristobal; Wicker, Thomas; Tiwari, Vijay; Abrouk, Michael; Poland, Jesse; Krattinger, Simon G.; KAUST Supercomputing Core Lab (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.; Bioscience Program; Biological and Environmental Science and Engineering (BESE) Division; Center for Desert Agriculture; Supercomputing, Computational Scientists; Plant Science; Plant Science Program; KAUST Supercomputing Laboratory (KSL); Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA.; Wheat Genetics Resource Center and Department of Plant Pathology, Kansas State University, Manhattan, KS, USA.; John Innes Centre, Norwich Research Park, Norwich, UK.; INRAE, CNRGV French Plant Genomic Resource Center, Castanet-Tolosan, France.; Crop Improvement and Genetics Research Unit, Western Regional Research Center, Agricultural Research Service, United States Department of Agriculture, Albany, CA, USA.; Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA. vktiwari@umd.edu.

    Analysis of monococcum introgressions into hexaploid wheat

  • Software

    IBEXCluster/Wheat-SNPCaller: Wheat SNP Caller pipeline

    (Github, 2022-03-27) Ahmed, Hanin; Kathiresan, Nagarajan; Abrouk, Michael; KAUST Supercomputing Core Lab (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.; Bioscience Program; Biological and Environmental Science and Engineering (BESE) Division; Supercomputing, Computational Scientists; Plant Science Program; Center for Desert Agriculture; KAUST Supercomputing Laboratory (KSL)

    Wheat SNP Caller pipeline

  • Dataset

    Data for: Einkorn genomics sheds light on history of the oldest domesticated wheat

    (Dryad, 2021) Ahmed, Hanin; Heuberger, Matthias; Schoen, Adam; Koo, Dal-Hoe; Quiroz-Chávez, Jesús; Adhikari, Laxman; Raupp, John; Cauet, Stéphane; Rodde, Nathalie; Cravero, Charlotte; Callot, Caroline; Lazo, Gerard R.; Kathiresan, Nagarajan; Sharma, Parva K.; Moot, Ian; Yadav, Inderjit Singh; Singh, Lovepreet; Saripalli, Gautam; Rawat, Nidhi; Datla, Raju; Athiyannan, Naveenkumar; Ramirez-Gonzalez, Ricardo H.; Uauy, Cristobal; Wicker, Thomas; Tiwari, Vijay; Abrouk, Michael; Poland, Jesse; Krattinger, Simon G.; KAUST Supercomputing Core Lab (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.; Bioscience Program; Biological and Environmental Science and Engineering (BESE) Division; Center for Desert Agriculture; Supercomputing, Computational Scientists; Plant Science; Plant Science Program; KAUST Supercomputing Laboratory (KSL); Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA.; Wheat Genetics Resource Center and Department of Plant Pathology, Kansas State University, Manhattan, KS, USA.; John Innes Centre, Norwich Research Park, Norwich, UK.; INRAE, CNRGV French Plant Genomic Resource Center, Castanet-Tolosan, France.; Crop Improvement and Genetics Research Unit, Western Regional Research Center, Agricultural Research Service, United States Department of Agriculture, Albany, CA, USA.; Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA. vktiwari@umd.edu.

    Einkorn (Triticum monococcum) is the first domesticated wheat species, being central to the birth of agriculture and the Neolithic Revolution in the Fertile Crescent ~10,000 years ago. Here, we generate and analyze 5.2-gigabase genome assemblies for wild and domesticated einkorn, including completely assembled centromeres. Einkorn centromeres are highly dynamic, showing evidence of ancient and recent centromere shifts caused by structural rearrangements. Whole-genome sequencing of a diversity panel uncovered the population structure and evolutionary history of einkorn, revealing complex patterns of hybridizations and introgressions following the dispersal of domesticated einkorn from the Fertile Crescent. We also discovered that around 1% of the modern bread wheat (Triticum aestivum) A subgenome originates from einkorn. These resources and findings highlight the history of einkorn evolution and provide a basis to accelerate the genomics-assisted improvement of einkorn and bread wheat.

  • Dataset

    CENH3 information from: Einkorn genomics sheds light on history of the oldest domesticated wheat

    (Dryad, 2022) Ahmed, Hanin; Heuberger, Matthias; Schoen, Adam; Koo, Dal-Hoe; Quiroz-Chávez, Jesús; Adhikari, Laxman; Raupp, John; Cauet, Stéphane; Rodde, Nathalie; Cravero, Charlotte; Callot, Caroline; Lazo, Gerard R.; Kathiresan, Nagarajan; Sharma, Parva K.; Moot, Ian; Yadav, Inderjit Singh; Singh, Lovepreet; Saripalli, Gautam; Rawat, Nidhi; Datla, Raju; Athiyannan, Naveenkumar; Ramirez-Gonzalez, Ricardo H.; Uauy, Cristobal; Wicker, Thomas; Tiwari, Vijay; Abrouk, Michael; Poland, Jesse; Krattinger, Simon G.; KAUST Supercomputing Core Lab (KSL), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.; Bioscience Program; Biological and Environmental Science and Engineering (BESE) Division; Center for Desert Agriculture; Supercomputing, Computational Scientists; Plant Science; Plant Science Program; KAUST Supercomputing Laboratory (KSL); Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA.; Wheat Genetics Resource Center and Department of Plant Pathology, Kansas State University, Manhattan, KS, USA.; John Innes Centre, Norwich Research Park, Norwich, UK.; INRAE, CNRGV French Plant Genomic Resource Center, Castanet-Tolosan, France.; Crop Improvement and Genetics Research Unit, Western Regional Research Center, Agricultural Research Service, United States Department of Agriculture, Albany, CA, USA.; Global Institute for Food Security, University of Saskatchewan, Saskatoon, Saskatchewan, Canada.; Department of Plant Science and Landscape Architecture, University of Maryland, College Park, MD, USA. vktiwari@umd.edu.

    Einkorn (Triticum monococcum) is the first domesticated wheat species, being central to the birth of agriculture and the Neolithic Revolution in the Fertile Crescent ~10,000 years ago. Here, we generate and analyze 5.2-gigabase genome assemblies for wild and domesticated einkorn, including completely assembled centromeres. Einkorn centromeres are highly dynamic, showing evidence of ancient and recent centromere shifts caused by structural rearrangements. Whole-genome sequencing of a diversity panel uncovered the population structure and evolutionary history of einkorn, revealing complex patterns of hybridizations and introgressions following the dispersal of domesticated einkorn from the Fertile Crescent. We also discovered that around 1% of the modern bread wheat (Triticum aestivum) A subgenome originates from einkorn. These resources and findings highlight the history of einkorn evolution and provide a basis to accelerate the genomics-assisted improvement of einkorn and bread wheat.