LINEAR-MODEL SELECTION BY CROSS-VALIDATION

被引：1213

作者：

SHAO, J

机构：

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 1993年 / 88卷 / 422期

关键词：

BALANCED INCOMPLETE; CONSISTENCY; DATA SPLITTING; MODEL ASSESSMENT; MONTE CARLO; PREDICTION;

D O I：

10.2307/2290328

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C(p), and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n --> infinity. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n(v)-out cross-validation with n(v), the number of observations reserved for validation, satisfying n(v)/n --> 1 as n --> infinity. This is a somewhat shocking discovery, because n(v)/n --> 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n(v)-out cross-validation method are provided, and results from a simulation study are presented.

引用

页码：486 / 494

页数：9

共 19 条

[1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].