Shrinkage Estimators for Robust and Efficient Inference in Haplotype-Based Case-Control Studies

被引:53
作者
Chen, Yi-Hau
Chatterjee, Nilanjan [1 ]
Carroll, Raymond J. [2 ]
机构
[1] NCI, Div Canc Epidemiol & Genet, NIH, Dept Hlth & Human Serv, Rockville, MD 20852 USA
[2] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
关键词
Empirical Bayes; Genetic epidemiology; LASSO (Least Absolute Shrinkage and Selection Operator); Model averaging; Model robustness; Model selection; GENE-ENVIRONMENT INDEPENDENCE; REGRESSION-MODEL; LINKAGE PHASE; GENOTYPE DATA; LIKELIHOOD; SELECTION; ASSOCIATIONS; TRAITS; TESTS; RISK;
D O I
10.1198/jasa.2009.0104
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
070103 [概率论与数理统计]; 140311 [社会设计与社会创新];
摘要
Case-control association studies often aim to investigate the role of genes and gene-environment interactions in terms of the underlying haplotypes (i.e., the combinations of alleles at multiple genetic loci along chromosomal regions). The goal of this article is to develop robust but efficient approaches to the estimation of disease odds-ratio parameters associated with haplotypes and haplotype-environment interactions. We consider "shrinkage" estimation techniques that can adaptively relax the model assumptions of Hardy-Weinberg-Equilibrium and gene-environment independence required by recently proposed efficient "retrospective" methods. Our proposal involves first development of a novel retrospective approach to the analysis of case-control data, one that is robust to the nature of the gene-environment distribution in the underlying population. Next, it involves shrinkage of the robust retrospective estimator toward a more precise, but model-dependent, retrospective estimator using novel empirical Bayes and penalized regression techniques. Methods for variance estimation are proposed based on asymptotic theories. Simulations and two data examples illustrate both the robustness and efficiency of the proposed methods.
引用
收藏
页码:220 / 233
页数:14
相关论文
共 27 条
[1]
Chang-Claude J, 2002, CANCER EPIDEM BIOMAR, V11, P698
[2]
Serniparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies [J].
Chatterjee, N ;
Carroll, RJ .
BIOMETRIKA, 2005, 92 (02) :399-418
[3]
Maximum likelihood inference on a mixed conditionally and marginally specified regression model for genetic epidemiologic studies with two-phase sampling [J].
Chatterjee, Nilanjan ;
Chen, Yi-Hau .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2007, 69 :123-142
[4]
An asymptotic theory for model selection inference in general semiparametric problems [J].
Claeskens, Gerda ;
Carroll, Raymond J. .
BIOMETRIKA, 2007, 94 (02) :249-265
[5]
The role of haplotypes in candidate gene studies [J].
Clark, AG .
GENETIC EPIDEMIOLOGY, 2004, 27 (04) :321-333
[6]
Inference on haplotype effects in case-control studies using unphased genotype data [J].
Epstein, MP ;
Satten, GA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (06) :1316-1329
[7]
Variable selection via nonconcave penalized likelihood and its oracle properties [J].
Fan, JQ ;
Li, RZ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (456) :1348-1360
[8]
The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: History, organization, and status [J].
Gohagan, JK ;
Prorok, PC ;
Hayes, RB ;
Kramer, BS .
CONTROLLED CLINICAL TRIALS, 2000, 21 (06) :251S-272S
[9]
Hastie T., 2001, ELEMENTS STAT LEARNI
[10]
Frequentist model average estimators [J].
Hjort, NL ;
Claeskens, G .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (464) :879-899