Review: A gentle introduction to imputation of missing values

被引:1857
作者
Donders, A. Rogier T. [1 ]
van der Heijden, Geert J. M. G.
Stijnen, Theo
Moons, Karel G. M.
机构
[1] Univ Utrecht, Ctr Biostat, NL-3508 TC Utrecht, Netherlands
[2] Univ Utrecht, Copernicus Inst, Dept Innovat Studies, NL-3508 TC Utrecht, Netherlands
[3] Univ Utrecht, Med Ctr, Julius Ctr Hlth Sci & Primary Care, NL-3508 TC Utrecht, Netherlands
[4] Erasmus Univ, Sch Med, Dept Epidemiol & Biostat, NL-3000 DR Rotterdam, Netherlands
关键词
missing data; single imputation; multiple imputation; indicator method; bias; precision;
D O I
10.1016/j.jclinepi.2006.01.014
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:1087 / 1091
页数:5
相关论文
共 16 条
  • [1] [Anonymous], 1994, LOGISTIC REGRESSION
  • [2] [Anonymous], 1999, 99054 TNOVGZPG
  • [3] Developing a prognostic model in the presence of missing data: an ovarian cancer case study
    Clark, TG
    Altman, DG
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (01) : 28 - 37
  • [4] A critical look at methods for handling missing covariates in epidemiologic regression analyses
    Greenland, S
    Finkle, WD
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 1995, 142 (12) : 1255 - 1264
  • [5] REGRESSION WITH MISSING XS - A REVIEW
    LITTLE, RJA
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (420) : 1227 - 1237
  • [6] Miettinen O. S., 1985, Theoretical epidemiology: principles of occurrence research in medicine
  • [7] Diagnostic research on routine care data prospects and problems
    Oostenbrink, R
    Moons, KGM
    Bleeker, SE
    Moll, HA
    Grobbee, DE
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (06) : 501 - 506
  • [8] Prediction of bacterial meningitis in children with meningeal signs: reduction of lumbar punctures
    Oostenbrink, R
    Moons, KGM
    Donders, ART
    Grobbee, DE
    Moll, HA
    [J]. ACTA PAEDIATRICA, 2001, 90 (06) : 611 - 617
  • [9] *R DEV COR TEAM R, 2004, LANG ENV STAT COMP
  • [10] Inference for imputation estimators
    Robins, JM
    Wang, NS
    [J]. BIOMETRIKA, 2000, 87 (01) : 113 - 124