ConfDTree: A Statistical Method for Improving Decision Trees

被引:5
作者
Gilad Katz [1 ,2 ]
Asaf Shabtai [1 ,2 ]
Lior Rokach [1 ,2 ]
Nir Ofek [1 ,2 ]
机构
[1] Department of Information Systems Engineering,Ben-Gurion University of the Negev,Beer Sheva 8410501,Israel
[2] Telekom Innovation Laboratories,Ben-Gurion University of the Negev,Beer Sheva 8410501,Israel
关键词
decision tree; confidence interval; imbalanced dataset;
D O I
暂无
中图分类号
TP181 [自动推理、机器学习];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree(Confidence-Based Decision Tree) — a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods(confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%~9% in the AUC performance is reported.
引用
收藏
页码:392 / 407
页数:16
相关论文
共 13 条
[1]   Confidence in predictions from random tree ensembles [J].
Bhattacharyya, Siddhartha .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (02) :391-410
[2]   Efficient classifiers for multi-class classification problems [J].
Lin, Hung-Yi .
DECISION SUPPORT SYSTEMS, 2012, 53 (03) :473-481
[3]   Decision trees for uplift modeling with single and multiple treatments [J].
Rzepakowski, Piotr ;
Jaroszewicz, Szymon .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (02) :303-327
[4]  
The WEKA data mining software[J] . Mark Hall,Eibe Frank,Geoffrey Holmes,Bernhard Pfahringer,Peter Reutemann,Ian H. Witten.ACM SIGKDD Explorations Newsletter . 2009 (1)
[5]  
Editorial[J] . Nitesh V. Chawla,Nathalie Japkowicz,Aleksander Kotcz.ACM SIGKDD Explorations Newsletter . 2004 (1)
[6]   A complete fuzzy decision tree technique [J].
Olaru, C ;
Wehenkel, L .
FUZZY SETS AND SYSTEMS, 2003, 138 (02) :221-254
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Technical Note: Some Properties of Splitting Criteria[J] . Leo Breiman.Machine Learning . 1996 (1)
[9]   A FURTHER COMPARISON OF SPLITTING RULES FOR DECISION-TREE INDUCTION [J].
BUNTINE, W ;
NIBLETT, T .
MACHINE LEARNING, 1992, 8 (01) :75-85
[10]  
Toward memory-based reasoning[J] . Craig Stanfill,David Waltz.Communications of the ACM . 1986 (12)