Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test

Handle URI:
http://hdl.handle.net/10754/598551
Title:
Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test
Authors:
Cai, T.; Lin, X.; Carroll, R. J.
Abstract:
In recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.
Citation:
Cai T, Lin X, Carroll RJ (2012) Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test. Biostatistics 13: 776–790. Available: http://dx.doi.org/10.1093/biostatistics/kxs015.
Publisher:
Oxford University Press (OUP)
Journal:
Biostatistics
KAUST Grant Number:
KUS-CI-016-04
Issue Date:
25-Jun-2012
DOI:
10.1093/biostatistics/kxs015
PubMed ID:
22734045
PubMed Central ID:
PMC3440238
Type:
Article
ISSN:
1465-4644; 1468-4357
Sponsors:
Research was supported by grants from the National Institute of Health (R01-GM079330 to T. C.) and the National Science Foundation (DMS-0854970 to T. C.); the National Cancer Institute (R37-CA076404 and P01-CA134294 to X. L.); the National Cancer Institute (R37-CA057030 to R.J.C.) and Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST) to R.J.C.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorCai, T.en
dc.contributor.authorLin, X.en
dc.contributor.authorCarroll, R. J.en
dc.date.accessioned2016-02-25T13:32:01Zen
dc.date.available2016-02-25T13:32:01Zen
dc.date.issued2012-06-25en
dc.identifier.citationCai T, Lin X, Carroll RJ (2012) Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test. Biostatistics 13: 776–790. Available: http://dx.doi.org/10.1093/biostatistics/kxs015.en
dc.identifier.issn1465-4644en
dc.identifier.issn1468-4357en
dc.identifier.pmid22734045en
dc.identifier.doi10.1093/biostatistics/kxs015en
dc.identifier.urihttp://hdl.handle.net/10754/598551en
dc.description.abstractIn recent years, genome-wide association studies (GWAS) and gene-expression profiling have generated a large number of valuable datasets for assessing how genetic variations are related to disease outcomes. With such datasets, it is often of interest to assess the overall effect of a set of genetic markers, assembled based on biological knowledge. Genetic marker-set analyses have been advocated as more reliable and powerful approaches compared with the traditional marginal approaches (Curtis and others, 2005. Pathways to the analysis of microarray data. TRENDS in Biotechnology 23, 429-435; Efroni and others, 2007. Identification of key processes underlying cancer phenotypes using biologic pathway analysis. PLoS One 2, 425). Procedures for testing the overall effect of a marker-set have been actively studied in recent years. For example, score tests derived under an Empirical Bayes (EB) framework (Liu and others, 2007. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 63, 1079-1088; Liu and others, 2008. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics 9, 292-2; Wu and others, 2010. Powerful SNP-set analysis for case-control genome-wide association studies. American Journal of Human Genetics 86, 929) have been proposed as powerful alternatives to the standard Rao score test (Rao, 1948. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44, 50-57). The advantages of these EB-based tests are most apparent when the markers are correlated, due to the reduction in the degrees of freedom. In this paper, we propose an adaptive score test which up- or down-weights the contributions from each member of the marker-set based on the Z-scores of their effects. Such an adaptive procedure gains power over the existing procedures when the signal is sparse and the correlation among the markers is weak. By combining evidence from both the EB-based score test and the adaptive test, we further construct an omnibus test that attains good power in most settings. The null distributions of the proposed test statistics can be approximated well either via simple perturbation procedures or via distributional approximations. Through extensive simulation studies, we demonstrate that the proposed procedures perform well in finite samples. We apply the tests to a breast cancer genetic study to assess the overall effect of the FGFR2 gene on breast cancer risk.en
dc.description.sponsorshipResearch was supported by grants from the National Institute of Health (R01-GM079330 to T. C.) and the National Science Foundation (DMS-0854970 to T. C.); the National Cancer Institute (R37-CA076404 and P01-CA134294 to X. L.); the National Cancer Institute (R37-CA057030 to R.J.C.) and Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST) to R.J.C.en
dc.publisherOxford University Press (OUP)en
dc.subjectAdaptive proceduresen
dc.subjectEmpirical Bayesen
dc.subjectGWASen
dc.subjectPathway analysisen
dc.subjectScore testen
dc.subjectSNP setsen
dc.subject.meshData Interpretation, Statisticalen
dc.subject.meshPhenotypeen
dc.titleIdentifying genetic marker sets associated with phenotypes via an efficient adaptive score testen
dc.typeArticleen
dc.identifier.journalBiostatisticsen
dc.identifier.pmcidPMC3440238en
dc.contributor.institutionDepartment of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. tcai@hsph.harvard.eduen
kaust.grant.numberKUS-CI-016-04en

Related articles on PubMed

All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.