LINEAR-MODEL SELECTION BY CROSS-VALIDATION

被引:1213
作者
SHAO, J
机构
关键词
BALANCED INCOMPLETE; CONSISTENCY; DATA SPLITTING; MODEL ASSESSMENT; MONTE CARLO; PREDICTION;
D O I
10.2307/2290328
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C(p), and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n --> infinity. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n(v)-out cross-validation with n(v), the number of observations reserved for validation, satisfying n(v)/n --> 1 as n --> infinity. This is a somewhat shocking discovery, because n(v)/n --> 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n(v)-out cross-validation method are provided, and results from a simulation study are presented.
引用
收藏
页码:486 / 494
页数:9
相关论文
共 19 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[3]  
BURMAN P, 1989, BIOMETRIKA, V76, P503, DOI 10.2307/2336116
[6]   PREDICTIVE SAMPLE REUSE METHOD WITH APPLICATIONS [J].
GEISSER, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :320-328
[7]  
Gunst R., 1980, REGRESSION ANAL ITS, DOI 10.1201/9780203741054
[8]  
HERZBERG AM, 1986, UTILITAS MATHEMATICA, V29, P209
[9]  
JOHN PWM, 1971, STATISTICAL DESIGN A