Scalable Gaussian-process regression and variable selection using Vecchia approximations
dc.contributor.author | Cao, Jian | |
dc.contributor.author | Guinness, Joseph | |
dc.contributor.author | Genton, Marc G. | |
dc.contributor.author | Katzfuss, Matthias | |
dc.date.accessioned | 2022-05-16T11:22:08Z | |
dc.date.available | 2022-05-16T11:22:08Z | |
dc.date.issued | 2022-03-02 | |
dc.identifier.uri | http://hdl.handle.net/10754/677959 | |
dc.description.abstract | Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods. | |
dc.description.sponsorship | Jian Cao was partially supported by the Texas A&M Institute of Data Science (TAMIDS) Postdoctoral Project program, Jian Cao and Matthias Katzfuss by National Science Foundation (NSF) Grant DMS– 1654083, Matthias Katzfuss and Joe Guinness by NSF Grant DMS–1953005, Matthias Katzfuss by NSF Grant CCF–1934904, and Jian Cao and Marc Genton were partially supported by the King Abdullah University of Science and Technology (KAUST). We would like to thank Felix Jimenez for his helpful comments and discussions. | |
dc.publisher | arXiv | |
dc.relation.url | https://arxiv.org/pdf/2202.12981.pdf | |
dc.rights | Archived with thanks to arXiv | |
dc.title | Scalable Gaussian-process regression and variable selection using Vecchia approximations | |
dc.type | Preprint | |
dc.contributor.department | Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division | |
dc.contributor.department | Extreme Computing Research Center | |
dc.contributor.department | Spatio-Temporal Statistics and Data Analysis Group | |
dc.contributor.department | Statistics Program | |
dc.eprint.version | Pre-print | |
dc.contributor.institution | Department of Statistics and Institute of Data Science, Texas A&M University | |
dc.contributor.institution | Department of Statistics, Cornell University | |
dc.contributor.institution | Department of Statistics, Texas A&M University | |
dc.identifier.arxivid | 2202.12981 | |
kaust.person | Genton, Marc G. | |
refterms.dateFOA | 2022-05-16T11:23:03Z |
Files in this item
This item appears in the following Collection(s)
-
Preprints
-
Extreme Computing Research Center
-
Statistics Program
For more information visit: https://stat.kaust.edu.sa/ -
Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
For more information visit: https://cemse.kaust.edu.sa/