A comparison of decision tree ensemble creation techniques

被引:313
作者
Banfield, Robert E.
Hall, Lawrence O.
Bowyer, Kevin W.
Kegelmeyer, W. P.
机构
[1] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[3] Sandia Natl Labs, Biosyst Res, Livermore, CA 94551 USA
基金
美国国家科学基金会;
关键词
classifier ensembles; bagging; boosting; random forests; random subspaces; performance evaluation;
D O I
10.1109/TPAMI.2007.250609
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We experimentally evaluate bagging and seven other randomization-based approaches to creating an ensemble of decision tree classifiers. Statistical tests were performed on experimental results from 57 publicly available data sets. When cross-validation comparisons were tested for statistical significance, the best method was statistically more accurate than bagging on only eight of the 57 data sets. Alternatively, examining the average ranks of the algorithms across the group of data sets, we find that boosting, random forests, and randomized trees are statistically significantly better than bagging. Because our results suggest that using an appropriate ensemble size is important, we introduce an algorithm that decides when a sufficient number of classifiers has been created for an ensemble. Our algorithm uses the out-of-bag error estimate, and is shown to result in an accurate ensemble for those methods that incorporate bagging into the construction of the ensemble.
引用
收藏
页码:173 / 180
页数:8
相关论文
共 23 条
[1]   Combined 5 x 2 cv F test for comparing supervised classification learning algorithms [J].
Alpaydin, E .
NEURAL COMPUTATION, 1999, 11 (08) :1885-1892
[2]  
Banfield R., 2005, OPENDT
[3]  
Banfield RE, 2003, LECT NOTES COMPUT SC, V2709, P306
[4]  
BANFIELD RE, 2006, P 2006 INT C SYST MA
[5]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[6]  
Breiman L, 1998, ANN STAT, V26, P841
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]  
Breiman L, 1998, ANN STAT, V26, P801
[10]  
Demsar J, 2006, J MACH LEARN RES, V7, P1