A geographically-diverse collection of 418 human gut microbiome pathway genome databases

Advances in high-throughput sequencing are reshaping how we perceive microbial communities inhabiting the human body, with implications for therapeutic interventions. Several large-scale datasets derived from hundreds of human microbiome samples sourced from multiple studies are now publicly available. However, idiosyncratic data processing methods between studies introduce systematic differences that confound comparative analyses. To overcome these challenges, we developed GutCyc, a compendium of environmental pathway genome databases (ePGDBs) constructed from 418 assembled human microbiome datasets using MetaPathways, enabling reproducible functional metagenomic annotation. We also generated metabolic network reconstructions for each metagenome using the Pathway Tools software, empowering researchers and clinicians interested in visualizing and interpreting metabolic pathways encoded by the human gut microbiome. For the first time, GutCyc provides consistent annotations and metabolic pathway predictions, making possible comparative community analyses between health and disease states in inflammatory bowel disease, Crohn’s disease, and type 2 diabetes. GutCyc data products are searchable online, or may be downloaded and explored locally using MetaPathways and Pathway Tools.

Hahn AS, Altman T, Konwar KM, Hanson NW, Kim D, et al. (2017) A geographically-diverse collection of 418 human gut microbiome pathway genome databases. Scientific Data 4: 170035. Available: http://dx.doi.org/10.1038/sdata.2017.35.

We would like to thank Peter D. Karp for feedback on the MetaPathways software and the GutCyc project; Robert Pesich for orchestrating our sneakernet transfer of data; and Les Dethlefsen for assisting in loading the data onto the Relman Lab server. A special thanks to the members of the Hallam, Relman, and Dill labs, and Whole Biome, for constructive feedback on the GutCyc project. Thank you to Pallavi Subhraveti of SRI International for help with exporting GutCyc data using Pathway Tools. Thank you to the Stanford FarmShare computation resource, for aiding in the development of an early version of GutCyc. The GutCyc project at UBC was carried out under the auspices of Compute/Calcul Canada, Genome Canada, Genome British Columbia, Genome Alberta, the Natural Science and Engineering Research Council (NSERC) of Canada, Ecosystem Services, Commercialization Platforms and Entrepreneurship (ECOSCOPE) program, the Canadian Foundation for Innovation (CFI), and the Canadian Institute for Advanced Research (CIFAR) through grants awarded to S.J.H. A.S.H. was supported by the Alexander Graham Bell Canada Graduate Scholarships-Doctoral Program (CGS D) administered by NSERC. K.M.K. was supported by the Tula Foundation funded Centre for Microbial Diversity and Evolution (CMDE) at UBC. N.W.H. was supported by a four year doctoral fellowship (4YF) administered through the UBC Faculty of Graduate and Postdoctoral Studies. T.A. was partially supported by the Stanford University School of Medicine Dean’s Funds and the NIH Biotechnology Training Grant at Stanford (grant number 5T32 GM008412). T.A. and D.L.D. were partially supported by a King Abdullah University of Science and Technology (KAUST) research grant under the KAUST Stanford Academic Excellence Alliance program. D.A.R. was supported by NIH/NIGMS 5R01GM099534 and by the Thomas C. and Joan M. Merigan Endowment at Stanford University. Additional computational resources were provided gratis through the Stanford FarmShare resource.

Springer Nature

Scientific Data


Permanent link to this record