What do we do with missing data? Some options for analysis of incomplete data

被引:259
作者
Raghunathan, TE [1 ]
机构
[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Inst Social Res, Ann Arbor, MI 48109 USA
关键词
available-case analysis; observed data likelihood; missing data mechanism; multiple imputation; nonresponse bias; weighting;
D O I
10.1146/annurev.publhealth.25.102802.124410
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Missing data are a pervasive problem in many public health investigations. The standard approach is to restrict the analysis to subjects with complete data on the variables involved in the analysis. Estimates from such analysis can be biased, especially if the subjects who are included in the analysis are systematically different from those who were excluded in terms of one or more key variables. Severity of bias in the estimates is illustrated through a simulation study in a logistic regression setting. This article reviews three approaches for analyzing incomplete data. The first approach involves weighting subjects who are included in the analysis to compensate for those who were excluded because of missing values. The second approach is based on multiple imputation where missing values are replaced by two or more plausible values. The final approach is based on constructing the likelihood based on the incomplete observed data. The same logistic regression example is used to illustrate the basic concepts and methodology. Some software packages for analyzing incomplete data are described.
引用
收藏
页码:99 / 117
页数:19
相关论文
共 36 条
[1]  
[Anonymous], 1994, LOGISTIC REGRESSION
[2]  
[Anonymous], P SECT SURV RES METH
[3]  
[Anonymous], 1983, INCOMPLETE DATA SAMP
[4]   Small-sample degrees of freedom with multiple imputation [J].
Barnard, J ;
Rubin, DB .
BIOMETRIKA, 1999, 86 (04) :948-955
[6]  
CARLIN JB, 2002, TOOLS ANAL MULTIPLE
[7]   ALTERNATIVE METHODS FOR CPS INCOME IMPUTATION [J].
DAVID, M ;
LITTLE, RJA ;
SAMUHEL, ME ;
TRIEST, RK .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1986, 81 (393) :29-41
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
Glynn RJ., 1986, REGRESSION ESTIMATES
[10]  
HECKMAN JJ, 1976, ANN ECON SOC MEAS, V5, P475