An efficient method to estimate bagging's generalization error

被引：125

作者：

Wolpert, DH

Macready, WG

机构：

[1] NASA, Ames Res Ctr, Caelum Res, Moffett Field, CA 94035 USA

[2] Bios Grp, LP, Santa Fe, NM 87501 USA

来源：

MACHINE LEARNING | 1999年 / 35卷 / 01期

关键词：

bagging; cross-validation; stacking; generalization error; bootstrap;

D O I：

10.1023/A:1007519102914

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bagging (Breiman, 1994a) is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set (Efron & Tibshirani, 1993, Efron, 1979). The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive, for leave-one-out cross-validation one needs to train the underlying algorithm on the order of my times, where m is the size of the training set and v is the number of replicates. This paper presents several techniques for estimating the generalization error of a bagged learning algorithm without invoking yet more training of the underlying learning algorithm (beyond that of the bagging itself), as is required by cross-validation-based estimation. These techniques all exploit the bias-variance decomposition (Geman, Bienenstock & Doursat, 1992, Wolpert, 1996). The best of our estimators also exploits stacking (Wolpert, 1992). In a set of experiments reported here, it was found to be more accurate than both the alternative cross-validation-based estimator of the bagged algorithm's error and the cross-validation-based estimator of the underlying algorithm's error. This improvement was particularly pronounced for small test sets. This suggests a novel justification for using bagging- more accurate estimation of the generalization error than is possible without bagging.

引用

页码：41 / 55

页数：15

共 11 条

[1]

BREIMAN L, 1994, 416 TR U CAL DEP STA

[2]

BREIMAN L, 1996, OUT OF BAG ESTIMATIO

[3]

BREIMAN L, 1994, 421 TR U CAL DEP STA

[4] COMPUTERS AND THE THEORY OF STATISTICS - THINKING THE UNTHINKABLE [J].