Theoretical comparison between the Gini Index and Information Gain criteria

被引:424
作者
Raileanu, LE [1 ]
Stoffel, K [1 ]
机构
[1] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
关键词
decision trees; classification; Gini Index; information gain; theoretical comparison;
D O I
10.1023/B:AMAI.0000018580.96245.c6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selection of the attribute used at each node of the tree to split the data ( split criterion) is crucial in order to correctly classify objects. Different split criteria were proposed in the literature ( Information Gain, Gini Index, etc.). It is not obvious which of them will produce the best decision tree for a given data set. A large amount of empirical tests were conducted in order to answer this question. No conclusive results were found. In this paper we introduce a formal methodology, which allows us to compare multiple split criteria. This permits us to present fundamental insights into the decision process. Furthermore, we are able to present a formal description of how to select between split criteria for a given data set. As an illustration we apply the methodology to two widely used split criteria: Gini Index and Information Gain.
引用
收藏
页码:77 / 93
页数:17
相关论文
共 22 条
[1]  
Babic A., 1992, Artificial Intelligence in Medicine, V4, P373, DOI 10.1016/0933-3657(92)90021-G
[2]  
BAKER E, 1976, P 3 INT JOINT C PATT, P45
[3]  
BENBASSAT M, 1978, IEEE T COMPUT, V27, P170, DOI 10.1109/TC.1978.1675054
[4]  
Breiman L., 1998, CLASSIFICATION REGRE
[5]   A DISTANCE-BASED ATTRIBUTE SELECTION MEASURE FOR DECISION TREE INDUCTION [J].
DEMANTARAS, RL .
MACHINE LEARNING, 1991, 6 (01) :81-92
[6]  
GAMA J, 1995, EPIA 95 PROGR ARTIFI, P1889
[7]  
Kononenko I., 1995, IJCAI-95. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, P1034
[8]  
LIM TS, 1999, MACHINE LEARNING
[9]  
Mingers J., 1989, Machine Learning, V3, P319, DOI 10.1007/BF00116837
[10]  
MINGERS J, 1987, J OPER RES SOC, V38, P39, DOI 10.2307/2582520