CAIM discretization algorithm

被引:286
作者
Kurgan, LA [1 ]
Cios, KJ
机构
[1] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2M7, Canada
[2] Univ Colorado, Dept Comp Sci & Engn, Denver, CO 80217 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
supervised discretization; class-attribute interdependency maximization; classification; CLIP4 machine learning algorithm;
D O I
10.1109/TKDE.2004.1269594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. This paper describes such an algorithm, called CAIM (class-attribute. interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.
引用
收藏
页码:145 / 153
页数:9
相关论文
共 29 条
[1]  
[Anonymous], 1992, The Tenth National Conference on Artificial Intelligence
[2]  
[Anonymous], 2000, STATLIB PROJECT REPO
[3]  
[Anonymous], P IEEE C SYST MAN CY
[4]  
[Anonymous], 1993, P 13 INT JOINT C ART
[5]  
[Anonymous], 1998, DATA MINING METHODS
[6]  
BLAKE C, 1998, UCI RESPOSITORY MACH
[7]  
CATLETT J, 1991, LECT NOTES ARTIF INT, V482, P164, DOI 10.1007/BFb0017012
[8]   CLASS-DEPENDENT DISCRETIZATION FOR INDUCTIVE LEARNING FROM CONTINUOUS AND MIXED-MODE DATA [J].
CHING, JY ;
WONG, AKC ;
CHAN, KCC .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (07) :641-651
[9]  
CHIU D, 1991, KNOWLEDGE DISCOVERY
[10]  
CIOS KJ, 2001, NEW LEARNING PARADIG, P276