Comprehensible credit scoring models using rule extraction from support vector machines

被引:291
作者
Martens, David
Baesens, Bart
Van Gestel, Tony
Vanthienen, Jan
机构
[1] Katholieke Univ Leuven, Dept Decis Sci & Informat Management, B-3000 Louvain, Belgium
[2] Univ Southampton, Sch Management, Southampton SO17 1BJ, Hants, England
[3] Risk Management, Basel II Modelling, B-1210 Brussels, Belgium
[4] Katholieke Univ Leuven, SISTA, SCD, ESAT,Dept Elect Engn, B-3001 Heverlee, Belgium
关键词
D O I
10.1016/j.ejor.2006.04.051
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
In recent years, support vector machines (SVMs) were successfully applied to a wide range of applications. However, since the classifier is described as a complex mathematical function, it is rather incomprehensible for humans. This opacity property prevents them from being used in many real-life applications where both accuracy and comprehensibility are required, such as medical diagnosis and credit risk evaluation. To overcome this limitation, rules can be extracted from the trained SVM that are interpretable by humans and keep as much of the accuracy of the SVM as possible. In this paper, we will provide an overview of the recently proposed rule extraction techniques for SVMs and introduce two others taken from the artificial neural networks domain, being Trepan and G-REX. The described techniques are compared using publicly available datasets, such as Ripley's synthetic dataset and the multi-class iris dataset. We will also look at medical diagnosis and credit scoring where comprehensibility is a key requirement and even a regulatory recommendation. Our experiments show that the SVM rule extraction techniques lose only a small percentage in performance compared to SVMs and therefore rank at the top of comprehensible classification techniques. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1466 / 1476
页数:11
相关论文
共 28 条
[1]   Survey and critique of techniques for extracting rules from trained artificial neural networks [J].
Andrews, R ;
Diederich, J ;
Tickle, AB .
KNOWLEDGE-BASED SYSTEMS, 1995, 8 (06) :373-389
[2]  
[Anonymous], 2004, 14 INT C COMP THEOR
[3]  
[Anonymous], P 11 ACM SIGKDD INT
[4]   Benchmarking state-of-the-art classification algorithms for credit scoring [J].
Baesens, B ;
Van Gestel, T ;
Viaene, S ;
Stepanova, M ;
Suykens, J ;
Vanthienen, J .
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2003, 54 (06) :627-635
[5]   Using neural network rule extraction and decision tables for credit-risk evaluation [J].
Baesens, B ;
Setiono, R ;
Mues, C ;
Vanthienen, J .
MANAGEMENT SCIENCE, 2003, 49 (03) :312-329
[6]   Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic problem domains [J].
Browne, A ;
Hudson, BD ;
Whitley, DC ;
Ford, MG ;
Picton, P .
NEUROCOMPUTING, 2004, 57 (1-4) :275-293
[7]  
Craven MW, 1996, ADV NEUR IN, V8, P24
[8]  
CRAVEN MW, 1996, THESIS U WINSCONSIN
[9]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction
[10]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054