Evolutionary rule-based systems for imbalanced data sets

被引:147
作者
Orriols-Puig, Albert [1 ]
Bernado-Mansilla, Ester [1 ]
机构
[1] Univ Ramon Llull, Grp Recerca Sistemes Intelligents Engn & Arquitec, Barcelona 08022, Spain
关键词
Imbalanced data; Rule-based systems; Data preprocessing; Classification; TESTS;
D O I
10.1007/s00500-008-0319-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the capabilities of evolutionary on-line rule-based systems, also called learning classifier systems (LCSs), for extracting knowledge from imbalanced data. While some learners may suffer from class imbalances and instances sparsely distributed around the feature space, we show that LCSs are flexible methods that can be adapted to detect such cases and find suitable models. Results on artificial data sets specifically designed for testing the capabilities of LCSs in imbalanced data show that LCSs are able to extract knowledge from highly imbalanced domains. When LCSs are used with real-world problems, they demonstrate to be one of the most robust methods compared with instance-based learners, decision trees, and support vector machines. Moreover, all the learners benefit from re-sampling techniques. Although there is not a re-sampling technique that performs best in all data sets and for all learners, those based in over-sampling seem to perform better on average. The paper adapts and analyzes LCSs for challenging imbalanced data sets and establishes the bases for further studying the combination of re-sampling technique and learner best suited to a specific kind of problem.
引用
收藏
页码:213 / 225
页数:13
相关论文
共 35 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]  
[Anonymous], P C GEN EV COMP
[3]  
[Anonymous], STUDIES FUZZINESS SO
[4]  
Batista G. E. A. P. A., 2004, ACM SIGKDD Explorations Newsletter, V6, P20, DOI DOI 10.1145/1007730.1007735
[5]   Accuracy-based Learning Classifier Systems:: Models, analysis and applications to classification tasks [J].
Bernadó-Mansilla, E ;
Garrell-Guiu, JM .
EVOLUTIONARY COMPUTATION, 2003, 11 (03) :209-238
[6]  
Bernat E., 2005, TEACHING ENGLISH 2 F, V9, P1, DOI [10.1177/1362168820971740, DOI 10.1177/1362168820971740]
[7]  
Butz M. V., 2001, Advances in Learning Classifier Systems. Third International Workshop, IWLCS 2000. Revised Papers (Lecture Notes in Artificial Intelligence Vol.1996), P253
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[10]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923