Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA
Type
ArticleKAUST Grant Number
KUS-CI-016-04GRP-CF-2011-19-P-Gao-Huang
Date
2014-12-22Online Publication Date
2014-12-22Print Publication Date
2014-10-02Permanent link to this record
http://hdl.handle.net/10754/597675
Metadata
Show full item recordAbstract
In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this paper, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L 1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection.Citation
Jung Y, Huang JZ, Hu J (2014) Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA. Journal of the American Statistical Association 109: 1355–1367. Available: http://dx.doi.org/10.1080/01621459.2014.928217.Sponsors
Hu's work was partially supported by the National Institute of Health Grants R21CA129671, R01GM080503, R01CA158113, and CGSG P30 CA016672. Huang's work was partially supported by grants from NSF (DMS-0907170, DMS-1007618, DMS-1208952), and Award Number KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). The authors thank the editor, the associate editor, and reviewers for many constructive comments.Publisher
Informa UK LimitedPubMed ID
25642005PubMed Central ID
PMC4310485ae974a485f413a2113503eed53cd6c53
10.1080/01621459.2014.928217
Scopus Count
Collections
Publications Acknowledging KAUST SupportRelated articles
- Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.
- Authors: García-Magariños M, López-de-Ullibarri I, Cao R, Salas A
- Issue date: 2009 May
- Shared genetic factors for age at natural menopause in Iranian and European women.
- Authors: Rahmani M, Earp MA, Ramezani Tehrani F, Ataee M, Wu J, Treml M, Nudischer R, P-Behnami S, ReproGen Consortium, Perry JR, Murabito JM, Azizi F, Brooks-Wilson A
- Issue date: 2013 Jul
- Genome-wide association analysis by lasso penalized logistic regression.
- Authors: Wu TT, Chen YF, Hastie T, Sobel E, Lange K
- Issue date: 2009 Mar 15
- Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method.
- Authors: Liu J, Wang K, Ma S, Huang J
- Issue date: 2013 Jan 1
- Variable selection and estimation in generalized linear models with the seamless L(0) penalty.
- Authors: Li Z, Wang S, Lin X
- Issue date: 2012 Dec