Selecting the Number of Principal Components in Functional Data

Handle URI:
http://hdl.handle.net/10754/599572
Title:
Selecting the Number of Principal Components in Functional Data
Authors:
Li, Yehua; Wang, Naisyin; Carroll, Raymond J.
Abstract:
Functional principal component analysis (FPCA) has become the most widely used dimension reduction tool for functional data analysis. We consider functional data measured at random, subject-specific time points, contaminated with measurement error, allowing for both sparse and dense functional data, and propose novel information criteria to select the number of principal component in such data. We propose a Bayesian information criterion based on marginal modeling that can consistently select the number of principal components for both sparse and dense functional data. For dense functional data, we also develop an Akaike information criterion based on the expected Kullback-Leibler information under a Gaussian assumption. In connecting with the time series literature, we also consider a class of information criteria proposed for factor analysis of multivariate time series and show that they are still consistent for dense functional data, if a prescribed undersmoothing scheme is undertaken in the FPCA algorithm. We perform intensive simulation studies and show that the proposed information criteria vastly outperform existing methods for this type of data. Surprisingly, our empirical evidence shows that our information criteria proposed for dense functional data also perform well for sparse functional data. An empirical example using colon carcinogenesis data is also provided to illustrate the results. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Citation:
Li Y, Wang N, Carroll RJ (2013) Selecting the Number of Principal Components in Functional Data. Journal of the American Statistical Association 108: 1284–1294. Available: http://dx.doi.org/10.1080/01621459.2013.788980.
Publisher:
Informa UK Limited
Journal:
Journal of the American Statistical Association
KAUST Grant Number:
KUS-CI-016-04
Issue Date:
Dec-2013
DOI:
10.1080/01621459.2013.788980
PubMed ID:
24376287
PubMed Central ID:
PMC3872138
Type:
Article
ISSN:
0162-1459; 1537-274X
Sponsors:
Li's research was supported by the National Science Foundation (DMS-1105634, DMS-1317118). Wang's research was supported by a grant from the National Cancer Institute (CA74552). Carroll's research was supported by a grant from the National Cancer Institute (R37-CA057030) and by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The authors thank the associate editor and two anonymous referees for their constructive comments that led to significant improvements in the article.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorLi, Yehuaen
dc.contributor.authorWang, Naisyinen
dc.contributor.authorCarroll, Raymond J.en
dc.date.accessioned2016-02-28T05:53:35Zen
dc.date.available2016-02-28T05:53:35Zen
dc.date.issued2013-12en
dc.identifier.citationLi Y, Wang N, Carroll RJ (2013) Selecting the Number of Principal Components in Functional Data. Journal of the American Statistical Association 108: 1284–1294. Available: http://dx.doi.org/10.1080/01621459.2013.788980.en
dc.identifier.issn0162-1459en
dc.identifier.issn1537-274Xen
dc.identifier.pmid24376287en
dc.identifier.doi10.1080/01621459.2013.788980en
dc.identifier.urihttp://hdl.handle.net/10754/599572en
dc.description.abstractFunctional principal component analysis (FPCA) has become the most widely used dimension reduction tool for functional data analysis. We consider functional data measured at random, subject-specific time points, contaminated with measurement error, allowing for both sparse and dense functional data, and propose novel information criteria to select the number of principal component in such data. We propose a Bayesian information criterion based on marginal modeling that can consistently select the number of principal components for both sparse and dense functional data. For dense functional data, we also develop an Akaike information criterion based on the expected Kullback-Leibler information under a Gaussian assumption. In connecting with the time series literature, we also consider a class of information criteria proposed for factor analysis of multivariate time series and show that they are still consistent for dense functional data, if a prescribed undersmoothing scheme is undertaken in the FPCA algorithm. We perform intensive simulation studies and show that the proposed information criteria vastly outperform existing methods for this type of data. Surprisingly, our empirical evidence shows that our information criteria proposed for dense functional data also perform well for sparse functional data. An empirical example using colon carcinogenesis data is also provided to illustrate the results. Supplementary materials for this article are available online. © 2013 American Statistical Association.en
dc.description.sponsorshipLi's research was supported by the National Science Foundation (DMS-1105634, DMS-1317118). Wang's research was supported by a grant from the National Cancer Institute (CA74552). Carroll's research was supported by a grant from the National Cancer Institute (R37-CA057030) and by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The authors thank the associate editor and two anonymous referees for their constructive comments that led to significant improvements in the article.en
dc.publisherInforma UK Limiteden
dc.subjectAkaike information criterionen
dc.subjectBayesian information criterionen
dc.subjectFunctional data analysisen
dc.subjectKernel smoothingen
dc.subjectModel selectionen
dc.titleSelecting the Number of Principal Components in Functional Dataen
dc.typeArticleen
dc.identifier.journalJournal of the American Statistical Associationen
dc.identifier.pmcidPMC3872138en
dc.contributor.institutionIowa State University, Ames, United Statesen
dc.contributor.institutionUniversity Michigan Ann Arbor, Ann Arbor, United Statesen
dc.contributor.institutionTexas A and M University, College Station, United Statesen
kaust.grant.numberKUS-CI-016-04en

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.