Bayesian site selection for fast Gaussian process regression

Handle URI:
http://hdl.handle.net/10754/597658
Title:
Bayesian site selection for fast Gaussian process regression
Authors:
Pourhabib, Arash; Liang, Faming; Ding, Yu
Abstract:
Gaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.
Citation:
Pourhabib A, Liang F, Ding Y (2014) Bayesian site selection for fast Gaussian process regression. IIE Transactions 46: 543–555. Available: http://dx.doi.org/10.1080/0740817X.2013.849833.
Publisher:
Informa UK Limited
Journal:
IIE Transactions
KAUST Grant Number:
KUS-C1-016-04
Issue Date:
5-Feb-2014
DOI:
10.1080/0740817X.2013.849833
Type:
Article
ISSN:
0740-817X; 1545-8830
Sponsors:
Arash Pourhabib and Yu Ding were supported in part by NSF grants CMMI-0926803 and CMMI-1000088; Yu Ding was also supported by the NSF grant CMMI-0726939; Faming Liang's research was partially supported by NSF grants CMMI-0926803, DMS-1007457, and DMS-1106494 and an award (KUS-C1-016-04) made by King Abdullah University of Science and Technology.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorPourhabib, Arashen
dc.contributor.authorLiang, Famingen
dc.contributor.authorDing, Yuen
dc.date.accessioned2016-02-25T12:43:53Zen
dc.date.available2016-02-25T12:43:53Zen
dc.date.issued2014-02-05en
dc.identifier.citationPourhabib A, Liang F, Ding Y (2014) Bayesian site selection for fast Gaussian process regression. IIE Transactions 46: 543–555. Available: http://dx.doi.org/10.1080/0740817X.2013.849833.en
dc.identifier.issn0740-817Xen
dc.identifier.issn1545-8830en
dc.identifier.doi10.1080/0740817X.2013.849833en
dc.identifier.urihttp://hdl.handle.net/10754/597658en
dc.description.abstractGaussian Process (GP) regression is a popular method in the field of machine learning and computer experiment designs; however, its ability to handle large data sets is hindered by the computational difficulty in inverting a large covariance matrix. Likelihood approximation methods were developed as a fast GP approximation, thereby reducing the computation cost of GP regression by utilizing a much smaller set of unobserved latent variables called pseudo points. This article reports a further improvement to the likelihood approximation methods by simultaneously deciding both the number and locations of the pseudo points. The proposed approach is a Bayesian site selection method where both the number and locations of the pseudo inputs are parameters in the model, and the Bayesian model is solved using a reversible jump Markov chain Monte Carlo technique. Through a number of simulated and real data sets, it is demonstrated that with appropriate priors chosen, the Bayesian site selection method can produce a good balance between computation time and prediction accuracy: it is fast enough to handle large data sets that a full GP is unable to handle, and it improves, quite often remarkably, the prediction accuracy, compared with the existing likelihood approximations. © 2014 Taylor and Francis Group, LLC.en
dc.description.sponsorshipArash Pourhabib and Yu Ding were supported in part by NSF grants CMMI-0926803 and CMMI-1000088; Yu Ding was also supported by the NSF grant CMMI-0726939; Faming Liang's research was partially supported by NSF grants CMMI-0926803, DMS-1007457, and DMS-1106494 and an award (KUS-C1-016-04) made by King Abdullah University of Science and Technology.en
dc.publisherInforma UK Limiteden
dc.subjectBayesian model averagingen
dc.subjectGaussian process computationen
dc.subjectlarge data setsen
dc.subjectreversible jump MCMCen
dc.titleBayesian site selection for fast Gaussian process regressionen
dc.typeArticleen
dc.identifier.journalIIE Transactionsen
dc.contributor.institutionTexas A and M University, College Station, United Statesen
kaust.grant.numberKUS-C1-016-04en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.