Maximum likelihood inference on a mixed conditionally and marginally specified regression model for genetic epidemiologic studies with two-phase sampling

被引:12
作者
Chatterjee, Nilanjan
Chen, Yi-Hau
机构
[1] NCI, Div Canc Epidemiol & Genet, Rockville, MD 20852 USA
[2] Acad Sinica, Taipei 115, Taiwan
关键词
case-control studies; gene-environment interaction; missing data; outcome-dependent sampling; semiparametric methods;
D O I
10.1111/j.1467-9868.2007.00580.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-phase stratified sampling designs can reduce the cost of genetic epidemiologic studies by limiting expensive ascertainments of genetic and environmental exposure to an efficiently selected subsample (phase II) of the main study (phase I). Family history and some covariate information, which may be cheaply gathered for all subjects at phase I, can be used for sampling of informative subjects at phase II. We develop alternative maximum likelihood methods for analysis of data from such studies by using a novel regression model that permits the estimation of 'marginal' risk parameters that are associated with the genetic and environmental covariates of interest, while simultaneously characterizing the 'conditional' risk of the disease associated with family history after adjusting for the other covariates. The methods and appropriate asymptotic theories are developed with and without an assumption of gene-environment independence, allowing the distribution of the environmental factors to remain non-parametric. The performance of the alternative methods and of sampling strategies is studied by using simulated data involving rare and common genetic variants. An application of the methods proposed is illustrated by using a case-control study of colorectal adenoma embedded within the prostate, lung, colorectal and ovarian cancer screening trial.
引用
收藏
页码:123 / 142
页数:20
相关论文
共 30 条
[1]  
BRESLOW NE, 1988, BIOMETRIKA, V75, P11
[2]   Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis [J].
Breslow, NE ;
Chatterjee, N .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1999, 48 :457-468
[3]   Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling [J].
Breslow, NE ;
Holubkov, R .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1997, 59 (02) :447-461
[4]   Serniparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies [J].
Chatterjee, N ;
Carroll, RJ .
BIOMETRIKA, 2005, 92 (02) :399-418
[5]   A pseudoscore estimator for regression problems with two-phase sampling [J].
Chatterjee, N ;
Chen, YH ;
Breslow, NE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (461) :158-168
[6]   ANALYTIC METHODS FOR 2-STAGE CASE-CONTROL STUDIES AND OTHER STRATIFIED DESIGNS [J].
FLANDERS, WD ;
GREENLAND, S .
STATISTICS IN MEDICINE, 1991, 10 (05) :739-747
[7]   UNIQUE CONSISTENT SOLUTION TO LIKELIHOOD EQUATIONS [J].
FOUTZ, RV .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1977, 72 (357) :147-148
[8]   The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: History, organization, and status [J].
Gohagan, JK ;
Prorok, PC ;
Hayes, RB ;
Kramer, BS .
CONTROLLED CLINICAL TRIALS, 2000, 21 (06) :251S-272S
[9]   A GENERALIZATION OF SAMPLING WITHOUT REPLACEMENT FROM A FINITE UNIVERSE [J].
HORVITZ, DG ;
THOMPSON, DJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1952, 47 (260) :663-685
[10]   Semiparametric methods for response-selective and missing data problems in regression [J].
Lawless, JF ;
Kalbfleisch, JD ;
Wild, CJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1999, 61 :413-438