Estimating generalization error on two-class datasets using out-of-bag estimates

被引：128

作者：

Bylander, T ^{[1
]}

机构：

[1] Univ Texas, Div Comp Sci, San Antonio, TX 78249 USA

来源：

MACHINE LEARNING | 2002年 / 48卷 / 1-3期

关键词：

bagging; cross-validation; generalization error;

D O I：

10.1023/A:1013964023376

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-bag votes.

引用

页码：287 / 297

页数：11

共 16 条

[1] Blake C.L., 1998, UCI repository of machine learning databases
[2] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[3] Breiman L, 1996, OUT OF BAG ESTIMATIO
[4] An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization
Dietterich, TG
[J]. MACHINE LEARNING, 2000, 40 (02) : 139 - 157
[5] Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
[6] Freund Y, 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P45
[7] Kearns M., 1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory, P152, DOI 10.1145/267460.267491
[8] Kohavi R., 1995, Proceedings of the 14th international joint conference on Artificial intelligence-Volume
[9] Maclin D., 1997, 14 NATL C ARTIFICIAL, P546
[10] Michie D., 1994, Technometrics, V37, P459, DOI DOI 10.2307/1269742

← 1 2 →