A critical look at methods for handling missing covariates in epidemiologic regression analyses

被引:666
作者
Greenland, S [1 ]
Finkle, WD [1 ]
机构
[1] CONSOLIDATED RES INC,LOS ANGELES,CA
关键词
biostatistics; epidemiologic methods; logistic regression; missing data; odds ratio; relative risk;
D O I
10.1093/oxfordjournals.aje.a117592
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Epidemiologic studies often encounter missing covariate values. While simple methods such as stratification on missing-data status, conditional-mean imputation, and complete-subject analysis are commonly employed for handling this problem, several studies have shown that these methods can be biased under reasonable circumstances. The authors review these results in the context of logistic regression and present simulation experiments showing the limitations of the methods, The method based on missing-data indicators can exhibit severe bias even when the data are missing completely at random, and regression (conditional-mean) imputation can be inordinately sensitive to model misspecification. Even complete-subject analysis can outperform these methods. More sophisticated methods, such as maximum likelihood, multiple imputation, and weighted estimating equations, have been given extensive attention in the statistics literature, While these methods are superior to simple methods, they are not commonly used in epidemiology, no doubt due to their complexity and the lack of packaged software to apply these methods. The authors contrast the results of multiple imputation to simple methods in the analysis of a case-control study of endometrial cancer, and they find a meaningful difference in results for age at menarche. In general, the authors recommend that epidemiologists avoid using the missing-indicator method and use more sophisticated methods whenever a large proportion of data are missing.
引用
收藏
页码:1255 / 1264
页数:10
相关论文
共 26 条
[1]   CLOSED-FORM ESTIMATES FOR MISSING COUNTS IN 2-WAY CONTINGENCY-TABLES [J].
BAKER, SG ;
ROSENBERGER, WF ;
DERSIMONIAN, R .
STATISTICS IN MEDICINE, 1992, 11 (05) :643-657
[2]   CASE-CONTROL STUDIES WITH ERRORS IN COVARIATES [J].
CARROLL, RJ ;
GAIL, MH ;
LUBIN, JH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :185-199
[3]  
Chow W., 1979, LOOK VARIOUS ESTIMAT, P417
[4]   CAUSAL-MODELS FOR PATTERNS OF NONRESPONSE [J].
FAY, RE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1986, 81 (394) :354-365
[5]  
FAY RE, 1994, COMMENT STAT SCI, V9, P558
[6]   ENDOMETRIAL CANCER RISK AFTER DISCONTINUING USE OF UNOPPOSED CONJUGATED ESTROGENS (CALIFORNIA, UNITED-STATES) [J].
FINKLE, WD ;
GREENLAND, S ;
MIETTINEN, OS ;
ZIEL, HK .
CANCER CAUSES & CONTROL, 1995, 6 (02) :99-102
[7]   ANALYTIC METHODS FOR 2-STAGE CASE-CONTROL STUDIES AND OTHER STRATIFIED DESIGNS [J].
FLANDERS, WD ;
GREENLAND, S .
STATISTICS IN MEDICINE, 1991, 10 (05) :739-747
[8]  
GREENLAND S, 1996, IN PRESS AM STAT, V50
[9]  
HEITJAN DF, 1991, APPL STAT-J ROY ST C, V40, P13
[10]  
JONES MP, 1994, 942 U IOW DEP STAT T