Improvements on cross-validation: The .632+ bootstrap method

被引:1237
作者
Efron, B [1 ]
Tibshirani, R [1 ]
机构
[1] UNIV TORONTO, DEPT PREVENT MED & BIOSTAT, TORONTO, ON M5S 1A8, CANADA
关键词
classification; cross-validation bootstrap; prediction rule;
D O I
10.2307/2965703
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 [统计学]; 070103 [概率论与数理统计]; 0714 [统计学];
摘要
A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? This is an important question both for comparing models and for assessing a final selected model. The traditional answer to this question is given by cross-validation. The cross-validation estimate of prediction error is nearly unbiased but can be highly variable. Here we discuss bootstrap estimates of prediction error, which can be thought of as smoothed versions of cross-validation. We show that a particular bootstrap method the .632+ rule, substantially outperforms cross-validation in a catalog of 24 simulation experiments. Besides providing point estimates, we also consider estimating the variability of an error rate estimate. All of the results here are nonparametric and apply to any possible prediction rule; however, we study only classification problems with 0-1 loss in detail. Our simulations include ''smooth'' prediction rules Like Fisher's linear discriminant function and unsmooth ones like nearest neighbors.
引用
收藏
页码:548 / 560
页数:13
相关论文
共 26 条
[1]
RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION [J].
ALLEN, DM .
TECHNOMETRICS, 1974, 16 (01) :125-127
[2]
SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]
SUBMODEL SELECTION AND EVALUATION IN REGRESSION - THE X-RANDOM CASE [J].
BREIMAN, L ;
SPECTOR, P .
INTERNATIONAL STATISTICAL REVIEW, 1992, 60 (03) :291-319
[4]
Breiman L., 1994, BAGGING PREDICTORS
[5]
CORRECTION [J].
CHERNICK, MR .
PATTERN RECOGNITION LETTERS, 1986, 4 (02) :133-142
[6]
APPLICATION OF BOOTSTRAP AND OTHER RESAMPLING TECHNIQUES - EVALUATION OF CLASSIFIER PERFORMANCE [J].
CHERNICK, MR ;
MURTHY, VK ;
NEALY, CD .
PATTERN RECOGNITION LETTERS, 1985, 3 (03) :167-178
[7]
COSMAN P, 1991, 25TH P ASIL C SIGN S, P434
[8]
EFFICIENT BOOTSTRAP SIMULATION [J].
DAVISON, AC ;
HINKLEY, DV ;
SCHECHTMAN, E .
BIOMETRIKA, 1986, 73 (03) :555-566
[10]
EFRON B, 1992, J ROY STAT SOC B MET, V54, P83