Robust estimation for homoscedastic regression in the secondary analysis of case-control data

Handle URI:
http://hdl.handle.net/10754/599522
Title:
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Authors:
Wei, Jiawei; Carroll, Raymond J.; Müller, Ursula U.; Keilegom, Ingrid Van; Chatterjee, Nilanjan
Abstract:
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Citation:
Wei J, Carroll RJ, Müller UU, Keilegom IV, Chatterjee N (2012) Robust estimation for homoscedastic regression in the secondary analysis of case-control data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75: 185–206. Available: http://dx.doi.org/10.1111/j.1467-9868.2012.01052.x.
Publisher:
Wiley-Blackwell
Journal:
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
KAUST Grant Number:
KUS-CI-016-04
Issue Date:
4-Dec-2012
DOI:
10.1111/j.1467-9868.2012.01052.x
PubMed ID:
23637568
PubMed Central ID:
PMC3639015
Type:
Article
ISSN:
1369-7412
Sponsors:
This paper represents part of the first author's doctoral dissertation at Texas A&M University. Wei and Carroll's research was supported by a grant from the National Cancer Institute (R37-CA057030). Carroll was also supported by award KUS-CI-016-04, made by King Abdullah University of Science and Technology. Chatterjee's research was supported by a gene–environment initiative grant from the National Heart, Lung and Blood Institute (RO1-HL091172-01) and by the Intramural Research Program of the National Cancer Institute. Müller was supported by a National Science Foundation grant (DMS-0907014). Van Keilegom gratefully acknowledges financial support from Interuniversity Attraction Pole research network P6/03 of the Belgian Government (Belgian science policy), and from the European Research Council under the European Community's seventh framework programme (FP7/2007-2013), European Research Council grant agreement 203650.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorWei, Jiaweien
dc.contributor.authorCarroll, Raymond J.en
dc.contributor.authorMüller, Ursula U.en
dc.contributor.authorKeilegom, Ingrid Vanen
dc.contributor.authorChatterjee, Nilanjanen
dc.date.accessioned2016-02-28T05:52:42Zen
dc.date.available2016-02-28T05:52:42Zen
dc.date.issued2012-12-04en
dc.identifier.citationWei J, Carroll RJ, Müller UU, Keilegom IV, Chatterjee N (2012) Robust estimation for homoscedastic regression in the secondary analysis of case-control data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75: 185–206. Available: http://dx.doi.org/10.1111/j.1467-9868.2012.01052.x.en
dc.identifier.issn1369-7412en
dc.identifier.pmid23637568en
dc.identifier.doi10.1111/j.1467-9868.2012.01052.xen
dc.identifier.urihttp://hdl.handle.net/10754/599522en
dc.description.abstractPrimary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.en
dc.description.sponsorshipThis paper represents part of the first author's doctoral dissertation at Texas A&M University. Wei and Carroll's research was supported by a grant from the National Cancer Institute (R37-CA057030). Carroll was also supported by award KUS-CI-016-04, made by King Abdullah University of Science and Technology. Chatterjee's research was supported by a gene–environment initiative grant from the National Heart, Lung and Blood Institute (RO1-HL091172-01) and by the Intramural Research Program of the National Cancer Institute. Müller was supported by a National Science Foundation grant (DMS-0907014). Van Keilegom gratefully acknowledges financial support from Interuniversity Attraction Pole research network P6/03 of the Belgian Government (Belgian science policy), and from the European Research Council under the European Community's seventh framework programme (FP7/2007-2013), European Research Council grant agreement 203650.en
dc.publisherWiley-Blackwellen
dc.subjectSemiparametric Inferenceen
dc.subjectBiased Samplesen
dc.subjectHomoscedastic Regressionen
dc.subjectSecondary Dataen
dc.subjectSecondary Phenotypesen
dc.subjectTwo-stage Samplesen
dc.titleRobust estimation for homoscedastic regression in the secondary analysis of case-control dataen
dc.typeArticleen
dc.identifier.journalJournal of the Royal Statistical Society: Series B (Statistical Methodology)en
dc.identifier.pmcidPMC3639015en
dc.contributor.institutionTexas A&M University, College Station, USA.en
kaust.grant.numberKUS-CI-016-04en
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.