Bayesian Subset Modeling for High-Dimensional Generalized Linear Models

Handle URI:
http://hdl.handle.net/10754/597659
Title:
Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
Authors:
Liang, Faming; Song, Qifan; Yu, Kai
Abstract:
This article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Citation:
Liang F, Song Q, Yu K (2013) Bayesian Subset Modeling for High-Dimensional Generalized Linear Models. Journal of the American Statistical Association 108: 589–606. Available: http://dx.doi.org/10.1080/01621459.2012.761942.
Publisher:
Informa UK Limited
Journal:
Journal of the American Statistical Association
KAUST Grant Number:
KUS-C1-016-04
Issue Date:
Jun-2013
DOI:
10.1080/01621459.2012.761942
Type:
Article
ISSN:
0162-1459; 1537-274X
Sponsors:
Faming Liang is Professor, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: fliang@stat.tamu.edu). Qifan Song is Graduate Student, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: qsong@stat.tamu.edu). Kai Yu is Investigator, Division of Cancer Epidemiology & Genetics, National Cancer Institute, Rockville, MD 20892-7335 (E-mail: yuka@mail.nih.gov). Liang's research was partially supported by grants from the National Science Foundation (DMS-1007457 and DMS-1106494) and the award (KUS-C1-016-04) made by King Abdullah University of Science and Technology (KAUST). The authors thank Dr. Chris Hans for sending us the lymph data, and thank the editor, associate editor, and two referees for their constructive comments that have led to significant improvement of this article.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorLiang, Famingen
dc.contributor.authorSong, Qifanen
dc.contributor.authorYu, Kaien
dc.date.accessioned2016-02-25T12:43:54Zen
dc.date.available2016-02-25T12:43:54Zen
dc.date.issued2013-06en
dc.identifier.citationLiang F, Song Q, Yu K (2013) Bayesian Subset Modeling for High-Dimensional Generalized Linear Models. Journal of the American Statistical Association 108: 589–606. Available: http://dx.doi.org/10.1080/01621459.2012.761942.en
dc.identifier.issn0162-1459en
dc.identifier.issn1537-274Xen
dc.identifier.doi10.1080/01621459.2012.761942en
dc.identifier.urihttp://hdl.handle.net/10754/597659en
dc.description.abstractThis article presents a new prior setting for high-dimensional generalized linear models, which leads to a Bayesian subset regression (BSR) with the maximum a posteriori model approximately equivalent to the minimum extended Bayesian information criterion model. The consistency of the resulting posterior is established under mild conditions. Further, a variable screening procedure is proposed based on the marginal inclusion probability, which shares the same properties of sure screening and consistency with the existing sure independence screening (SIS) and iterative sure independence screening (ISIS) procedures. However, since the proposed procedure makes use of joint information from all predictors, it generally outperforms SIS and ISIS in real applications. This article also makes extensive comparisons of BSR with the popular penalized likelihood methods, including Lasso, elastic net, SIS, and ISIS. The numerical results indicate that BSR can generally outperform the penalized likelihood methods. The models selected by BSR tend to be sparser and, more importantly, of higher prediction ability. In addition, the performance of the penalized likelihood methods tends to deteriorate as the number of predictors increases, while this is not significant for BSR. Supplementary materials for this article are available online. © 2013 American Statistical Association.en
dc.description.sponsorshipFaming Liang is Professor, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: fliang@stat.tamu.edu). Qifan Song is Graduate Student, Department of Statistics, Texas A&M University, College Station, TX 77843-3143 (E-mail: qsong@stat.tamu.edu). Kai Yu is Investigator, Division of Cancer Epidemiology & Genetics, National Cancer Institute, Rockville, MD 20892-7335 (E-mail: yuka@mail.nih.gov). Liang's research was partially supported by grants from the National Science Foundation (DMS-1007457 and DMS-1106494) and the award (KUS-C1-016-04) made by King Abdullah University of Science and Technology (KAUST). The authors thank Dr. Chris Hans for sending us the lymph data, and thank the editor, associate editor, and two referees for their constructive comments that have led to significant improvement of this article.en
dc.publisherInforma UK Limiteden
dc.subjectBayesian classificationen
dc.subjectPosterior consistencyen
dc.subjectStochastic approximation Monte Carloen
dc.subjectSure variable screeningen
dc.subjectVariable selectionen
dc.titleBayesian Subset Modeling for High-Dimensional Generalized Linear Modelsen
dc.typeArticleen
dc.identifier.journalJournal of the American Statistical Associationen
dc.contributor.institutionTexas A and M University, College Station, United Statesen
dc.contributor.institutionNational Cancer Institute, Bethesda, United Statesen
kaust.grant.numberKUS-C1-016-04en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.