STATLOG - COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS

被引:163
作者
KING, RD
FENG, C
SUTHERLAND, A
机构
[1] UNIV STRATHCLYDE, DEPT STAT, GLASGOW G1 1XW, LANARK, SCOTLAND
[2] TURING INST LTD, GLASGOW, LANARK, SCOTLAND
关键词
D O I
10.1080/08839519508945477
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from symbolic learning (CART, C4.5, NewlD, AC(2), ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions). Twelve datasets were used:five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the data set investigated. We therefore developed a set of data set descriptors to help decide which algorithms are suited to particular data sets. For example, data sets with extreme distributions (skew > 1 and kurtosis > 7) and with many binary/categorical attributes (> 38%) tend to favor symbolic learning algorithms. We suggest how classification algorithms can be extended in a number of directions.
引用
收藏
页码:289 / 333
页数:45
相关论文
共 70 条
[61]   CROSS-VALIDATORY CHOICE AND ASSESSMENT OF STATISTICAL PREDICTIONS [J].
STONE, M .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1974, 36 (02) :111-147
[62]  
SUTHERLAND A, 1992, C NEW TECHNIQUES TEC
[63]  
THRUN SB, 1991, MONKS PROBLEMS PERFO, P1
[64]   COMPARISON OF DISCRIMINATION TECHNIQUES APPLIED TO A COMPLEX DATA SET OF HEAD INJURED PATIENTS [J].
TITTERINGTON, DM ;
MURRAY, GD ;
MURRAY, LS ;
SPIEGELHALTER, DJ ;
SKENE, AM ;
HABBEMA, JDF ;
GELPKE, GJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1981, 144 :145-175
[65]  
TSAPTSINOS D, 1990, APPLICATION ARTIFICI, V5
[66]  
UNGER S, 1981, METHODEN AUTOMATISCH
[67]  
VANCUTSEM T, 1991, 2ND INT WORKSH BULK
[68]  
Weiss S. M., 1989, IJCAI-89 Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, P781
[69]   MAXIMIZING THE PREDICTIVE VALUE OF PRODUCTION RULES [J].
WEISS, SM ;
GALEN, RS ;
TADEPALLI, PV .
ARTIFICIAL INTELLIGENCE, 1990, 45 (1-2) :47-71
[70]  
WEISS SM, 1991, COMPUTER SYSTEMS LER