A survey of cross-validation procedures for model selection

被引:2669
作者
Arlot, Sylvain [1 ]
Celisse, Alain [2 ]
机构
[1] CNRS, Willow Project Team, Lab Informat, CNRS ENS INRIA UMR 8548,Ecole Normale Super, 23 Ave Italie, F-75214 Paris 13, France
[2] Univ Lille 1, CNRS, UMR 8524, Lab Math Paul Painleve, F-59655 Villeneuve, France
关键词
Model selection; cross-validation; leave-one-out;
D O I
10.1214/09-SS054
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
引用
收藏
页码:40 / 79
页数:40
相关论文
共 125 条
[41]  
Devroye L., 1996, APPL MATH NEW YORK, V31
[42]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[43]  
EFRON B, 1973, J ROY STAT SOC B MET, V35, P379
[44]   Improvements on cross-validation: The .632+ bootstrap method [J].
Efron, B ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560
[45]  
Efron B, 2004, J AM STAT ASSOC, V99, P619, DOI 10.1198/016214504000000692
[47]   Model selection by bootstrap penalization for classification [J].
Fromont, Magalie .
MACHINE LEARNING, 2007, 66 (2-3) :165-207
[48]  
GEISSER S, 1974, BIOMETRIKA, V61, P101, DOI 10.1093/biomet/61.1.101
[49]   PREDICTIVE SAMPLE REUSE METHOD WITH APPLICATIONS [J].
GEISSER, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1975, 70 (350) :320-328
[50]  
Girard DA, 1998, ANN STAT, V26, P315