Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

Handle URI:
http://hdl.handle.net/10754/597652
Title:
Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection
Authors:
Dhavala, Soma S.; Datta, Sujay; Mallick, Bani K.; Carroll, Raymond J.; Khare, Sangeeta; Lawhon, Sara D.; Adams, L. Garry
Abstract:
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.
Citation:
Dhavala SS, Datta S, Mallick BK, Carroll RJ, Khare S, et al. (2010) Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection . Journal of the American Statistical Association 105: 956–967. Available: http://dx.doi.org/10.1198/jasa.2010.ap08327.
Publisher:
Informa UK Limited
Journal:
Journal of the American Statistical Association
KAUST Grant Number:
KUS-CI-016-04
Issue Date:
Sep-2010
DOI:
10.1198/jasa.2010.ap08327
Type:
Article
ISSN:
0162-1459; 1537-274X
Sponsors:
Soma S. Dhavala is a Doctoral Candiate, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: soma@stat.tamu.edu). Sujay Datta is Senior Scientist and Faculty Member, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, M2-C125,1100 Fairview Avenue N., Seattle, WA 98109 (E-mail: sdatta@fhcrc.org). Bani K. Mal lick is Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: bmallick@stat.tamu.edu). Raymond J. Carroll is Distinguished Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: carroll@stat.tamu.edu). Sangeeta Khare is Research Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University. 4467 TAMU, College Station, TX 77843 (E-mail: skhare@cvm.tamu.edu). Sara D. Lawhon is Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: slawhon@cvm.tamu.edu). L. Garry Adams is Professor. Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: gadams@cvm.tamu.edu). The research of Bani K. Mal lick and Raymond J. Carroll was supported by from the National Cancer Institute grants (CA 104620 and CA57030, respectively), National Science Foundation grant DMS 0914951. and by award KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The research of Sujay Datta was supported by a postdoctoral training grant from the National Cancer Institute (CA90301). The research of L. Garry Adams was supported by the grants NIAID 1 RO1 A144170-01A1, USDA 2002-35204-12247, and NSF DMS 0914951. Public Health Service grant AI060933 supported the research of Sara D. Lawhon. The authors are greatful to Dr. David Dahl for discussions, and to the editors and the two anonymous referees for their suggestions and constructive comments.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorDhavala, Soma S.en
dc.contributor.authorDatta, Sujayen
dc.contributor.authorMallick, Bani K.en
dc.contributor.authorCarroll, Raymond J.en
dc.contributor.authorKhare, Sangeetaen
dc.contributor.authorLawhon, Sara D.en
dc.contributor.authorAdams, L. Garryen
dc.date.accessioned2016-02-25T12:43:45Zen
dc.date.available2016-02-25T12:43:45Zen
dc.date.issued2010-09en
dc.identifier.citationDhavala SS, Datta S, Mallick BK, Carroll RJ, Khare S, et al. (2010) Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection . Journal of the American Statistical Association 105: 956–967. Available: http://dx.doi.org/10.1198/jasa.2010.ap08327.en
dc.identifier.issn0162-1459en
dc.identifier.issn1537-274Xen
dc.identifier.doi10.1198/jasa.2010.ap08327en
dc.identifier.urihttp://hdl.handle.net/10754/597652en
dc.description.abstractMassively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.en
dc.description.sponsorshipSoma S. Dhavala is a Doctoral Candiate, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: soma@stat.tamu.edu). Sujay Datta is Senior Scientist and Faculty Member, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, M2-C125,1100 Fairview Avenue N., Seattle, WA 98109 (E-mail: sdatta@fhcrc.org). Bani K. Mal lick is Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: bmallick@stat.tamu.edu). Raymond J. Carroll is Distinguished Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: carroll@stat.tamu.edu). Sangeeta Khare is Research Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University. 4467 TAMU, College Station, TX 77843 (E-mail: skhare@cvm.tamu.edu). Sara D. Lawhon is Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: slawhon@cvm.tamu.edu). L. Garry Adams is Professor. Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: gadams@cvm.tamu.edu). The research of Bani K. Mal lick and Raymond J. Carroll was supported by from the National Cancer Institute grants (CA 104620 and CA57030, respectively), National Science Foundation grant DMS 0914951. and by award KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The research of Sujay Datta was supported by a postdoctoral training grant from the National Cancer Institute (CA90301). The research of L. Garry Adams was supported by the grants NIAID 1 RO1 A144170-01A1, USDA 2002-35204-12247, and NSF DMS 0914951. Public Health Service grant AI060933 supported the research of Sara D. Lawhon. The authors are greatful to Dr. David Dahl for discussions, and to the editors and the two anonymous referees for their suggestions and constructive comments.en
dc.publisherInforma UK Limiteden
dc.subjectBayesian semiparametric modelingen
dc.subjectDirichlet process mixtureen
dc.subjectMarkov chain Monte Carloen
dc.subjectZero-inflated Poissonen
dc.titleBayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infectionen
dc.typeArticleen
dc.identifier.journalJournal of the American Statistical Associationen
dc.contributor.institutionTexas A and M University, College Station, United Statesen
dc.contributor.institutionFred Hutchinson Cancer Research Center, Seattle, United Statesen
kaust.grant.numberKUS-CI-016-04en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.