An Akaike information criterion for model selection in the presence of incomplete data

被引:42
作者
Cavanaugh, JE
Shumway, RH
机构
[1] Univ Missouri, Dept Stat, Columbia, MO 65211 USA
[2] Univ Calif Davis, Div Stat, Livermore, CA 95616 USA
关键词
AIC; EM algorithm; information theory; Kullback-Leibler information; model selection criteria; PDIO criterion; SEM algorithm;
D O I
10.1016/S0378-3758(97)00115-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 [统计学]; 070103 [概率论与数理统计]; 0714 [统计学];
摘要
We derive and investigate a variant of AIC, the Akaike information criterion, for model selection in settings where the observed data is incomplete. Our variant is based on the motivation provided for the PDIO ('predictive divergence for incomplete observation models') criterion of Shimodaira (1994, in: Selecting Models from Data: Artificial Intelligence and Statistics IV, Lecture Notes in Statistics, vol. 89, Springer, New York, pp. 21-29). However, our variant differs from PDIO in its 'goodness-of-fit' term. Unlike AIC and PDIO, which require the computation of the observed-data empirical log-likelihood, our criterion can be evaluated using only complete-data tools, readily available through the EM algorithm and the SEM ('supplemented' EM) algorithm of Meng and Rubin (Journal of the American Statistical Association 86 (1991) 899-909). We compare the performance of our AIC variant to that of both AIC and PDIO in simulations where the data being modeled contains missing values. The results indicate that our criterion is less prone to overfitting than AIC and less prone to underfitting than PDIO. (C) 1998 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:45 / 65
页数:21
相关论文
共 18 条
[1]
NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]
Akaike H, 1973, 2 INT S INFORM THEOR, P199, DOI 10.1007/978-1-4612-1694-0
[3]
MODEL SELECTION FOR MULTIVARIATE REGRESSION IN SMALL SAMPLES [J].
BEDRICK, EJ ;
TSAI, CL .
BIOMETRICS, 1994, 50 (01) :226-231
[4]
BHANSALI RJ, 1993, DEV TIME SERIES ANAL, P50
[5]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]
IMPROVED ESTIMATORS OF KULLBACK-LEIBLER INFORMATION FOR AUTOREGRESSIVE MODEL SELECTION IN SMALL SAMPLES [J].
HURVICH, CM ;
SHUMWAY, R ;
TSAI, CL .
BIOMETRIKA, 1990, 77 (04) :709-719
[7]
HURVICH CM, 1989, BIOMETRIKA, V76, P297, DOI 10.2307/2336663
[8]
KULLBACK S, 1968, INFORMATION THEORY S
[9]
Linhart H., 1986, MODEL SELECTION
[10]
LITTLE R.J., 1987, Statistical Analysis With Missing Data, P381, DOI 10.1002/9781119013563