Model selection for incomplete and design-based samples

被引:56
作者
Hens, N. [1 ]
Aerts, M. [1 ]
Molenberghs, G. [1 ]
机构
[1] Univ Hasselt, Ctr Stat, B-3590 Diepenbeek, Belgium
关键词
missing data; weighted likelihood; model selection; complex designs; Akaike information criterion;
D O I
10.1002/sim.2559
中图分类号
Q [生物科学];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
The Akaike information criterion, AIC, is one of the most frequently used methods to select one or a few good, optimal regression models from a set of candidate models. In case the sample is incomplete, the naive use of this criterion on the so-called complete cases can lead to the selection of poor or inappropriate models. A similar problem occurs when a sample based on a design with unequal selection probabilities, is treated as a simple random sample. In this paper, we consider a modification of AIC, based on reweighing the sample in analogy with the weighted Horvitz-Thompson estimates. It is shown that this weighted AIC-criterion provides better model choices for both incomplete and design-based samples. The use of the weighted AIC-criterion is illustrated on data from the Belgian Health Interview Survey, which motivated this research. Simulations show its performance in a variety of settings. Copyright (c) 2006 John Wiley & Sons, Ltd.
引用
收藏
页码:2502 / 2520
页数:19
相关论文
共 31 条
[1]
Aerts M, 1999, J AM STAT ASSOC, V94, P869
[2]
Robust model selection in regression via weighted likelihood methodology [J].
Agostinelli, C .
STATISTICS & PROBABILITY LETTERS, 2002, 56 (03) :289-300
[3]
Agostinelli C, 2001, STAT SINICA, V11, P499
[4]
Akaike H., 1973, Selected papers of hirotugu akaike, P267
[5]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]
Burnham K. P., 2002, MODEL SELECTION MULT, DOI [10.1007/b97636, DOI 10.1007/B97636]
[7]
An Akaike information criterion for model selection in the presence of incomplete data [J].
Cavanaugh, JE ;
Shumway, RH .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1998, 67 (01) :45-65
[8]
ANALYTIC METHODS FOR 2-STAGE CASE-CONTROL STUDIES AND OTHER STRATIFIED DESIGNS [J].
FLANDERS, WD ;
GREENLAND, S .
STATISTICS IN MEDICINE, 1991, 10 (05) :739-747
[9]
HENS N, 2002, ARCH PUBLIC HEALTH, V60, P275
[10]
A GENERALIZATION OF SAMPLING WITHOUT REPLACEMENT FROM A FINITE UNIVERSE [J].
HORVITZ, DG ;
THOMPSON, DJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1952, 47 (260) :663-685