Coverage-based resampling: Building robust consolidated decision trees

被引:27
作者
Ibarguren, Igor [1 ]
Perez, Jesus M. [1 ]
Muguerza, Javier [1 ]
Gurrutxaga, Ibai [1 ]
Arbelaitz, Olatz [1 ]
机构
[1] Univ Basque Country UPV EHU, Dept Comp Architecture & Technol, Donostia San Sebastian 20018, Spain
关键词
Comprehensibility; Consolidated decision trees; Class imbalance; Resampling; Inner ensembles; CLASS IMBALANCE; STATISTICAL COMPARISONS; DATA SETS; CLASSIFICATION; CLASSIFIERS; DATASETS; TRENDS; SMOTE;
D O I
10.1016/j.knosys.2014.12.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance problem has attracted a lot of attention from the data mining community recently, becoming a current trend in machine learning research. The Consolidated Tree Construction (CTC) algorithm was proposed as an algorithm to solve a classification problem involving a high degree of class imbalance without losing the explaining capacity, a desirable characteristic of single decision trees and rule sets. CTC works by resampling the training sample and building a tree from each subsample, in a similar manner to ensemble classifiers, but applying the ensemble process during the tree construction phase, resulting in a unique final tree. In the ECML/PKDD 2013 conference the term "Inner Ensembles" was coined to refer to such methodologies. In this paper we propose a resampling strategy for classification algorithms that use multiple subsamples. This strategy is based on the class distribution of the training sample to ensure a minimum representation of all classes when resampling. This strategy has been applied to CTC over different classification contexts. A robust classification algorithm should not just be able to rank in the top positions for certain classification problems but should be able to excel when faced with a broad range of problems. In this paper we establish the robustness of the CTC algorithm against a wide set of classification algorithms with explaining capacity. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:51 / 67
页数:17
相关论文
共 68 条
[1]  
Abbasian Houman, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P33, DOI 10.1007/978-3-642-40994-3_3
[2]   The quest for the optimal class distribution: An approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets [J].
Albisua I. ;
Arbelaitz O. ;
Gurrutxaga I. ;
Lasarguren A. ;
Muguerza J. ;
Pérez J.M. .
Pérez, J. M. (txus.perez@ehu.es), 1600, Springer Verlag (02) :45-63
[3]  
Albisua I, 2010, LECT NOTES ARTIF INT, V5988, P101, DOI 10.1007/978-3-642-14264-2_11
[4]  
[Anonymous], ACT 15 C AS ESP INT
[5]  
[Anonymous], 1993, C4 5 PROGRAMS MACHIN
[6]  
[Anonymous], 2012, IEEE T SYST MAN CY C, DOI DOI 10.1109/TSMCC.2011.2161285
[7]  
[Anonymous], MATH PROBL ENG
[8]  
[Anonymous], MACHINE LEARNING
[9]  
[Anonymous], P ICML
[10]  
[Anonymous], 2000, LEARNING IMBALANCED