MULTIPLE-IMPUTATION INFERENCES WITH UNCONGENIAL SOURCES OF INPUT

被引:527
作者
MENG, XL
机构
[1] Department of Statistics, University of Chicago, Chicago, IL, 60637
关键词
CONGENIALITY; SELF EFFICIENCY; IMPORTANCE SAMPLING; INCOMPLETE DATA; MISSING DATA; NONRESPONSE; NORMALIZING CONSTANTS; PUBLIC USE DATA FILE; RANDOMIZATION;
D O I
10.1214/ss/1177010269
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Conducting sample surveys, imputing incomplete observations, and analyzing the resulting data are three indispensable phases of modern practice with public-use data files and with many other statistical applications. Each phase inherits different input, including the information preceding it and the intellectual assessments available, and aims to provide output that is one step closer to arriving at statistical inferences with scientific relevance. However, the role of the imputation phase has often been viewed as merely providing computational convenience for users of data. Although facilitating computation is very important, such a viewpoint ignores the imputer's assessments and information inaccessible to the users. This view underlies the recent controversy over the validity of multiple-imputation inference when a procedure for analyzing multiply imputed data sets cannot be derived from (is ''uncongenial'' to) the model adopted for multiple imputation. Given sensible imputations and complete-data analysis procedures, inferences from standard multiple-imputation combining roles are typically superior to, and thus different from, users' incomplete-data analyses. The latter may suffer from serious nonresponse biases because such analyses often must rely on convenient but unrealistic assumptions about the nonresponse mechanism. When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputations is to incorporate appropriate importance weights into the standard combining rules. These points are reviewed and explored by simple examples and general theory, from both Bayesian and frequentist perspectives, particularly from the randomization perspective. Some convenient terms are suggested for facilitating communication among researchers from different perspectives when evaluating multiple-imputation inferences with uncongenial sources of input.
引用
收藏
页码:538 / 558
页数:21
相关论文
共 58 条
[1]  
BELIN TR, 1993, J AM STAT ASSOC, V88, P1149, DOI 10.2307/2290812
[2]  
Bellhouse D.R., 1988, HDB STAT, V6, P1, DOI DOI 10.1016/S0169-7161(88)06003-1
[3]   Address to the Economic Science and Statistics Section of the British Association for the advancement of science, York, 1906. [J].
Bowley, AL .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY, 1906, 69 :540-558
[4]   MULTIPLE IMPUTATION OF INDUSTRY AND OCCUPATION CODES IN CENSUS PUBLIC-USE SAMPLES USING BAYESIAN LOGISTIC-REGRESSION [J].
CLOGG, CC ;
RUBIN, DB ;
SCHENKER, N ;
SCHULTZ, B ;
WEIDMAN, L .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1991, 86 (413) :68-78
[5]  
DOREY FJ, 1990, 1990 JOINT STAT M AN
[6]   MISSING DATA, IMPUTATION, AND THE BOOTSTRAP [J].
EFRON, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (426) :463-475
[7]  
ERICSON WA, 1969, J ROY STAT SOC B, V31, P195
[8]  
FAY RE, 1991, 1991 P ANN RES C WAS, P429
[9]  
Fay Robert E., 1992, P SURVEY RES METHODS, V81, P227
[10]   INFERENCE FROM COARSE DATA VIA MULTIPLE IMPUTATION WITH APPLICATION TO AGE HEAPING [J].
HEITJAN, DF ;
RUBIN, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (410) :304-314