Evaluation of classifiers for an uneven class distribution problem

被引:143
作者
Daskalaki, S
Kopanas, I
Avouris, N [1 ]
机构
[1] Univ Patras, Dept Elect & Comp Engn, Human Comp Interact Grp, GR-26500 Rion, Greece
[2] Univ Patras, Dept Engn Sci, Rion, Greece
关键词
D O I
10.1080/08839510500313653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification problems with uneven class distributions present several difficulties during the training as well as during the evaluation process of classifiers. A classification problem with such characteristics has resulted from a data mining project where the objective was to predict customer insolvency. Using the data set from the customer insolvency problem, we study several alternative methodologies, which have been reported to better suit the specific characteristics of this type of problem. Three different but equally important directions are examined: ( a) the performance measures that should be used for problems in this domain; (b) the class distributions that should be used for the training data sets; and ( c) the classification algorithms to be used. The final evaluation of the resulting classifiers is based on a study of the economic impact of classification results. This study concludes to a framework that provides the "best'' classifiers, identifies the performance measures that should be used as the decision criterion, and suggests the "best'' class distribution based on the value of the relative gain from correct classification in the positive class. This framework has been applied in the customer insolvency problem, but it is claimed that it can be applied to many similar problems with uneven class distributions that almost always require a multi-objective evaluation process.
引用
收藏
页码:381 / 417
页数:37
相关论文
共 43 条
[1]  
Abbott D., 1999, Proceedings of the Second International Conference on Information Fusion. FUSION '99, P289
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]  
Ali FOG, 1997, DECIS SUPPORT SYST, V21, P3, DOI 10.1016/S0167-9236(97)00010-9
[4]  
[Anonymous], P 7 ACM SIGKDD INT C
[5]  
[Anonymous], P INT C MACH LEARN I
[6]  
[Anonymous], KDD 1998 WORKSH DIST
[7]  
[Anonymous], [No title captured]
[8]  
[Anonymous], 2000, P INT C MACHINE LEAR
[9]  
[Anonymous], 1994, SIGIR
[10]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139