An experimental comparison of classification algorithms for imbalanced credit scoring data sets

被引:443
作者
Brown, Iain [1 ]
Mues, Christophe [1 ]
机构
[1] Univ Southampton, Sch Management, Southampton SO17 1BJ, Hants, England
基金
英国工程与自然科学研究理事会;
关键词
Credit scoring; Imbalanced datasets; Classification; Benchmarking; DISCRIMINANT-ANALYSIS; NEURAL-NETWORKS; MODELS; PREDICTION;
D O I
10.1016/j.eswa.2011.09.033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we set out to compare several techniques that can be used in the analysis of imbalanced credit scoring data sets. In a credit scoring context, imbalanced data sets frequently occur as the number of defaulting loans in a portfolio is usually much lower than the number of observations that do not default. As well as using traditional classification techniques such as logistic regression, neural networks and decision trees, this paper will also explore the suitability of gradient boosting, least square support vector machines and random forests for loan default prediction. Five real-world credit scoring data sets are used to build classifiers and test their performance. In our experiments, we progressively increase class imbalance in each of these data sets by randomly under-sampling the minority class of defaulters, so as to identify to what extent the predictive power of the respective techniques is adversely affected. The performance criterion chosen to measure this effect is the area under the receiver operating characteristic curve (AUC); Friedman's statistic and Nemenyi post hoc tests are used to test for significance of AUC differences between techniques. The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets. We also found that, when faced with a large class imbalance, the C4.5 decision tree algorithm, quadratic discriminant analysis and k-nearest neighbours perform significantly worse than the best performing classifiers. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3446 / 3453
页数:8
相关论文
共 33 条
[11]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[12]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[13]   A comparison of neural networks and linear scoring models in the credit union environment [J].
Desai, VS ;
Crook, JN ;
Overstreet, GA .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1996, 95 (01) :24-37
[14]  
Friedman J., 2001, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, V1
[15]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[16]   Stochastic gradient boosting [J].
Friedman, JH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 38 (04) :367-378
[17]   A comparison of alternative tests of significance for the problem of m rankings [J].
Friedman, M .
ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 :86-92
[18]  
Henley W. E., 1997, IMA Journal of Mathematics Applied in Business and Industry, V8, P305, DOI 10.1093/imaman/8.4.305
[19]  
Hosmer DW., 2000, Applied logistic regression, DOI DOI 10.1002/0471722146.CH4
[20]  
JAPKOWICZ N, 2000, AAAI WORKSH LEARN IM, V6, P10