Efficient algorithms for decision tree cross-validation

被引:97
作者
Blockeel, H [1 ]
Struyf, J [1 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, B-3001 Louvain, Belgium
关键词
decision trees; cross-validation; inductive logic programming;
D O I
10.1162/jmlr.2003.3.4-5.621
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. We identify a number of parameters that influence the obtainable speedups, and validate and refine our analysis with experiments on a variety of data sets with two different implementations. Besides cross-validation, we also briefly explore the usefulness of these techniques for bagging. We conclude with some guidelines concerning when these optimizations should be considered.
引用
收藏
页码:621 / 650
页数:30
相关论文
共 24 条
[1]  
[Anonymous], P 15 INT C MACH LEAR
[2]  
[Anonymous], 1995, LECT NOTES COMPUTER, DOI DOI 10.1007/3-540-59286-557
[3]  
[Anonymous], 1993, P 13 INT JOINT C ART
[4]  
Apers P., 1996, LECT NOTES COMPUTER, V1057, P18, DOI DOI 10.1007/BFB0014141
[5]   Top-down induction of first-order logical decision trees [J].
Blockeel, H ;
De Raedt, L .
ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) :285-297
[6]   Improving the efficiency of inductive logic programming through the use of query packs [J].
Blockeel, H ;
Dehaspe, L ;
Demoen, B ;
Janssens, G ;
Ramon, J ;
Vandecasteele, H .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2002, 16 :135-166
[7]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[8]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[9]  
Clark P., 1989, Machine Learning, V3, P261, DOI 10.1023/A:1022641700528
[10]  
DERAEDT L, 1995, LECT NOTES ARTIF INT, V997, P80