CHOOSING AMONG IMPUTATION TECHNIQUES FOR INCOMPLETE MULTIVARIATE DATA - A SIMULATION STUDY

被引:12
作者
BELLO, AL [1 ]
机构
[1] UNIV OXFORD,DEPT STAT,OXFORD OX1 3TG,ENGLAND
关键词
IMPUTATION TECHNIQUES; IMPUTED DATA MATRIX; EM ALGORITHM; PRINCIPAL COMPONENT ANALYSIS; SINGULAR VALUE DECOMPOSITION;
D O I
10.1080/03610929308831061
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A wide variety of strategies for coping with the problem of missing values, which frequently arises in multivariate data, have been proposed and tried over the years. One popular and important strategy is to estimate the missing values themselves in some way, usually achieved by imputation techniques. By means of Monte Carlo simulations, this paper investigates the relative performance of five deterministic imputation techniques using normal and non-normal data with several factors that may affect their efficiency. The imputation techniques are: mean substitution method (MSM), EM algorithm (EM), Dear's principal component method (DPC), general iterative principal component method (GIP) and singular value decomposition method (SVD). GIP is a refined, iterative version of DPC, developed to overcome certain problems with the latter. Although results indicate that no single imputation technique is best overall in all combinations of factors studied, MSM and DPC behave erratically; when the intercorrelation among the variables is moderate or high, they performed worse than the iterative imputation techniques-EM, SVD, and GIP-which, under this condition, are equally efficient. An illustrative real data example is given.
引用
收藏
页码:853 / 877
页数:25
相关论文
共 34 条
[21]   CROSS-VALIDATION IN PRINCIPAL COMPONENT ANALYSIS [J].
KRZANOWSKI, WJ .
BIOMETRICS, 1987, 43 (03) :575-584
[22]   LARGE-SAMPLE SIGNIFICANCE LEVELS FROM MULTIPLY IMPUTED DATA USING MOMENT-BASED STATISTICS AND AN F-REFERENCE DISTRIBUTION [J].
LI, KH ;
RAGHUNATHAN, TE ;
RUBIN, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1991, 86 (416) :1065-1073
[23]  
LITTLE R, 1987, STATISTICAL ANAL MIS
[24]  
LITTLE RJA, 1988, APPLIED STATISTICS, V37, P23, DOI DOI 10.2307/2347491
[25]  
LOUIS TA, 1982, J ROY STAT SOC B MET, V44, P226
[26]  
Mardia KV., 1979, MULTIVARIATE ANAL
[27]  
MENG XL, 1991, JTH VAL INT M BAYES
[28]  
Rubin D., 1978, P SURV RES METH SECT, P20, DOI DOI 10.1631/JZUS.C10B0359
[29]  
RUBIN DB, 1976, BIOMETRIKA, V63, P581, DOI 10.2307/2335739
[30]  
Rubin DB, 1991, MULTIPLE IMPUTATION