Estimating generalization error on two-class datasets using out-of-bag estimates

被引:128
作者
Bylander, T [1 ]
机构
[1] Univ Texas, Div Comp Sci, San Antonio, TX 78249 USA
关键词
bagging; cross-validation; generalization error;
D O I
10.1023/A:1013964023376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-bag votes.
引用
收藏
页码:287 / 297
页数:11
相关论文
共 16 条
  • [1] Blake C.L., 1998, UCI repository of machine learning databases
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Breiman L, 1996, OUT OF BAG ESTIMATIO
  • [4] An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization
    Dietterich, TG
    [J]. MACHINE LEARNING, 2000, 40 (02) : 139 - 157
  • [5] Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
  • [6] Freund Y, 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P45
  • [7] Kearns M., 1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory, P152, DOI 10.1145/267460.267491
  • [8] Kohavi R., 1995, Proceedings of the 14th international joint conference on Artificial intelligence-Volume
  • [9] Maclin D., 1997, 14 NATL C ARTIFICIAL, P546
  • [10] Michie D., 1994, Technometrics, V37, P459, DOI DOI 10.2307/1269742