Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment

被引:112
作者
Demirtas, Hakan [1 ]
Freels, Sally A. [1 ]
Yucel, Recai M. [2 ]
机构
[1] Univ Illinois, Div Epidemiol & Biostat MC 923, Chicago, IL 60612 USA
[2] Univ Massachusetts, Dept Biostat & Epidemiol, Sch Publ Hlth & Hlth Sci, Amherst, MA 01003 USA
关键词
multivariate normality; multiple imputation; symmetry; skewness; multimodality;
D O I
10.1080/10629360600903866
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multiple imputation under the assumption of multivariate normality has emerged as a frequently used model-based approach in dealing with incomplete continuous data in recent years. Despite its simplicity and popularity, however, its plausibility has not been thoroughly evaluated via simulation. In this work, the performance of multiple imputation under a multivariate Gaussian model with unstructured covariances was examined on a broad range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality that are not accommodated by a Gaussian model. Behavior of efficiency and accuracy measures was explored to determine the extent to which the procedure works properly. The conclusion drawn is that although the real data rarely conform with multivariate normality, imputation under the assumption of normality is a fairly reasonable tool, even when the assumption of normality is clearly violated; the fraction of missing information is high, especially when the sample size is relatively large. Although we discourage its uncritical, automatic and, possibly, inappropriate use, we report that its performance is better than we expected, leading us to believe that it is probably an underrated approach.
引用
收藏
页码:69 / 84
页数:16
相关论文
共 15 条
[1]   A comparison of inclusive and restrictive strategies in modern missing data procedures [J].
Collins, LM ;
Schafer, JL ;
Kam, CM .
PSYCHOLOGICAL METHODS, 2001, 6 (04) :330-351
[2]   Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out [J].
Demirtas, H .
STATISTICS IN MEDICINE, 2005, 24 (15) :2345-2363
[3]   Simulation driven inferences for multiply imputed longitudinal datasets [J].
Demirtas, H .
STATISTICA NEERLANDICA, 2004, 58 (04) :466-482
[4]   On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out [J].
Demirtas, H ;
Schafer, JL .
STATISTICS IN MEDICINE, 2003, 22 (16) :2553-2575
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   Tukey's gh distribution for multiple imputation [J].
He, Yulei ;
Raghunathan, Trivellore E. .
AMERICAN STATISTICIAN, 2006, 60 (03) :251-256
[7]  
Little R. J., 2019, STAT ANAL MISSING DA, V793, DOI DOI 10.1002
[8]   Multiple imputation after 18+ years [J].
Rubin, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :473-489
[9]   INFERENCE AND MISSING DATA [J].
RUBIN, DB .
BIOMETRIKA, 1976, 63 (03) :581-590
[10]  
Rubin DonaldB., 1987, MULTIPLE IMPUTATIONS, DOI DOI 10.1002/9780470316696