Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation

被引:586
作者
Lee, Katherine J. [1 ,2 ]
Carlin, John B. [1 ,2 ]
机构
[1] Royal Childrens Hosp, Murdoch Childrens Res Inst, Clin Epidemiol & Biostat Unit, Melbourne, Vic, Australia
[2] Univ Melbourne, Dept Paediat, Fac Med Dent & Hlth Sci, Melbourne, Vic, Australia
基金
英国医学研究理事会;
关键词
data interpretation; statistical; epidemiologic methods; imputation; incomplete data; missing data; simulations; VALUES; UPDATE;
D O I
10.1093/aje/kwp425
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Statistical analysis in epidemiologic studies is often hindered by missing data, and multiple imputation is increasingly being used to handle this problem. In a simulation study, the authors compared 2 methods for imputation that are widely available in standard software: fully conditional specification (FCS) or "chained equations" and multivariate normal imputation (MVNI). The authors created data sets of 1,000 observations to simulate a cohort study, and missing data were induced under 3 missing-data mechanisms. Imputations were performed using FCS (Royston's "ice") and MVNI (Schafer's NORM) in Stata (Stata Corporation, College Station, Texas), with transformations or prediction matching being used to manage nonnormality in the continuous variables. Inferences for a set of regression parameters were compared between these approaches and a complete-case analysis. As expected, both FCS and MVNI were generally less biased than complete-case analysis, and both produced similar results despite the presence of binary and ordinal variables that clearly did not follow a normal distribution. Ignoring skewness in a continuous covariate led to large biases and poor coverage for the corresponding regression parameter under both approaches, although inferences for other parameters were largely unaffected. These results provide reassurance that similar results can be expected from FCS and MVNI in a standard regression analysis involving variously scaled variables.
引用
收藏
页码:624 / 632
页数:9
相关论文
共 35 条
[1]  
Allison P., 2002, QUANTITATIVE APPL SO
[2]  
[Anonymous], 2009, STAT STAT SOFTW REL
[3]  
[Anonymous], 2000, SURV METHODOL
[4]  
[Anonymous], 2007, Stata statistical software
[5]   Robustness of a multivariate normal approximation for imputation of incomplete binary data [J].
Bernaards, Coen A. ;
Belin, Thomas R. ;
Schafer, Joseph L. .
STATISTICS IN MEDICINE, 2007, 26 (06) :1368-1382
[6]   A new framework for managing and analyzing multiply imputed data in Stata [J].
Carlin, John B. ;
Galati, John C. ;
Royston, Patrick .
STATA JOURNAL, 2008, 8 (01) :49-67
[7]   The efficacy of female condom skills training in HIV risk reduction among women: A randomized controlled trial [J].
Choi, Kyung-Hee ;
Hoff, Colleen ;
Gregorich, Steven E. ;
Grinstead, Olga ;
Gomez, Cynthia ;
Hussey, Wendy .
AMERICAN JOURNAL OF PUBLIC HEALTH, 2008, 98 (10) :1841-1848
[8]   Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment [J].
Demirtas, Hakan ;
Freels, Sally A. ;
Yucel, Recai M. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2008, 78 (01) :69-84
[9]  
GALATI JC, 2008, INORM STATA MODULE P
[10]  
Gelman Andrew., 2012, mi: Missing data imputation and model checking