Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier

Handle URI:
http://hdl.handle.net/10754/600134
Title:
Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier
Authors:
Carroll, Raymond J.; Delaigle, Aurore; Hall, Peter
Abstract:
The data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually smooth functions that were not actually observed. Existing literature shows that this approach is effective, and even optimal, when using functional data methods for prediction or hypothesis testing. However, in the present paper we show that this approach is not effective in classification problems. There a useful rule of thumb is that undersmoothing is often desirable, but there are several surprising qualifications to that approach. First, the effect of smoothing the training data can be more significant than that of smoothing the new data set to be classified; second, undersmoothing is not always the right approach, and in fact in some cases using a relatively large bandwidth can be more effective; and third, these perverse results are the consequence of very unusual properties of error rates, expressed as functions of smoothing parameters. For example, the orders of magnitude of optimal smoothing parameter choices depend on the signs and sizes of terms in an expansion of error rate, and those signs and sizes can vary dramatically from one setting to another, even for the same classifier.
Citation:
Carroll RJ, Delaigle A, Hall P (2013) Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier. The Annals of Statistics 41: 2739–2767. Available: http://dx.doi.org/10.1214/13-AOS1158.
Publisher:
Institute of Mathematical Statistics
Journal:
The Annals of Statistics
KAUST Grant Number:
KUS-CI-016-04
Issue Date:
Dec-2013
DOI:
10.1214/13-AOS1158
PubMed ID:
25309640
PubMed Central ID:
PMC4191932
Type:
Article
ISSN:
0090-5364
Sponsors:
Supported by a Grant from the National Cancer Institute (R37-CA057030). This publication is based in part on work supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).Supported by grants and fellowships from the Australian Research Council.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorCarroll, Raymond J.en
dc.contributor.authorDelaigle, Auroreen
dc.contributor.authorHall, Peteren
dc.date.accessioned2016-02-28T06:43:25Zen
dc.date.available2016-02-28T06:43:25Zen
dc.date.issued2013-12en
dc.identifier.citationCarroll RJ, Delaigle A, Hall P (2013) Unexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifier. The Annals of Statistics 41: 2739–2767. Available: http://dx.doi.org/10.1214/13-AOS1158.en
dc.identifier.issn0090-5364en
dc.identifier.pmid25309640en
dc.identifier.doi10.1214/13-AOS1158en
dc.identifier.urihttp://hdl.handle.net/10754/600134en
dc.description.abstractThe data functions that are studied in the course of functional data analysis are assembled from discrete data, and the level of smoothing that is used is generally that which is appropriate for accurate approximation of the conceptually smooth functions that were not actually observed. Existing literature shows that this approach is effective, and even optimal, when using functional data methods for prediction or hypothesis testing. However, in the present paper we show that this approach is not effective in classification problems. There a useful rule of thumb is that undersmoothing is often desirable, but there are several surprising qualifications to that approach. First, the effect of smoothing the training data can be more significant than that of smoothing the new data set to be classified; second, undersmoothing is not always the right approach, and in fact in some cases using a relatively large bandwidth can be more effective; and third, these perverse results are the consequence of very unusual properties of error rates, expressed as functions of smoothing parameters. For example, the orders of magnitude of optimal smoothing parameter choices depend on the signs and sizes of terms in an expansion of error rate, and those signs and sizes can vary dramatically from one setting to another, even for the same classifier.en
dc.description.sponsorshipSupported by a Grant from the National Cancer Institute (R37-CA057030). This publication is based in part on work supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).Supported by grants and fellowships from the Australian Research Council.en
dc.publisherInstitute of Mathematical Statisticsen
dc.subjectDiscriminationen
dc.subjectKernel Smoothingen
dc.subjectCentroid Methoden
dc.subjectTraining Dataen
dc.subjectQuadratic Discriminationen
dc.subjectSmoothing Parameter Choiceen
dc.titleUnexpected properties of bandwidth choice when smoothing discrete data for constructing a functional data classifieren
dc.typeArticleen
dc.identifier.journalThe Annals of Statisticsen
dc.identifier.pmcidPMC4191932en
dc.contributor.institutionDepartment of Statistics Texas A&M University College Station, Texas 77843 USA carroll@stat.tamu.edu.en
dc.contributor.institutionDepartment of Mathematics and Statistics University of Melbourne, Parkville Victoria 3010 Australia A.Delaigle@ms.unimelb.edu.au halpstat@ms.unimelb.edu.au.en
kaust.grant.numberKUS-CI-016-04en

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.