A survey of cross-validation procedures for model selection

被引:2669
作者
Arlot, Sylvain [1 ]
Celisse, Alain [2 ]
机构
[1] CNRS, Willow Project Team, Lab Informat, CNRS ENS INRIA UMR 8548,Ecole Normale Super, 23 Ave Italie, F-75214 Paris 13, France
[2] Univ Lille 1, CNRS, UMR 8524, Lab Math Paul Painleve, F-59655 Villeneuve, France
关键词
Model selection; cross-validation; leave-one-out;
D O I
10.1214/09-SS054
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
引用
收藏
页码:40 / 79
页数:40
相关论文
共 125 条
[1]   STATISTICAL PREDICTOR IDENTIFICATION [J].
AKAIKE, H .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1970, 22 (02) :203-&
[2]  
Akaike H., 1973, P 2 INT S INFORM, DOI 10.1007/978-1-4612-1694-0
[3]   RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[4]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[5]  
Anderson R. L., 1972, STAT PAPERS GW SNEDE
[6]  
[Anonymous], WORKING PAPER SERIES
[7]  
Arlot S., 2009, ARXIV09023977V2
[8]  
Arlot S., 2007, THESIS
[9]  
Arlot S., 2008, ARXIV08123141
[10]  
Arlot S., 2008, ARXIV08020566V2