A DISTANCE-BASED ATTRIBUTE SELECTION MEASURE FOR DECISION TREE INDUCTION

被引:236
作者
DEMANTARAS, RL
机构
[1] Centre of Advanced Studies, CSIC, Girona
关键词
DISTANCE BETWEEN PARTITIONS; DECISION TREE INDUCTION; INFORMATION MEASURES;
D O I
10.1023/A:1022694001379
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This note introduces a new attribute selection measure for ID3-like inductive algorithms. This measure is based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node. The relationship of this measure with Quinlan's information gain is also established. It is also formally proved that our distance is not biased towards attributes with large numbers of values. Experimental studies with this distance confirm previously reported results showing that the predictive accuracy of induced decision trees is not sensitive to the goodness of the attribute selection measure. However, this distance produces smaller trees than the gain ratio measure of Quinlan, especially in the case of data whose attributes have significantly different numbers of values.
引用
收藏
页码:81 / 92
页数:12
相关论文
共 10 条
  • [1] BRATKO I, 1986, SEMINAR AI METHODS S
  • [2] BREIMAN I, 1984, CLASSIFICATION REGRE
  • [3] Cestnik B., 1987, PROGR MACHINE LEARNI
  • [4] Hart A., 1984, RES DEV EXPERT SYSTE
  • [5] KONONENKO I, 1984, EXPT AUTOMATIC LEARN
  • [6] Lopez De Mantaras R., 1977, THESIS P SABATIER U
  • [7] Mingers J., 1989, Machine Learning, V3, P319, DOI 10.1007/BF00116837
  • [8] Niblett T., 1987, PROGR MACHINE LEARNI
  • [9] Quinlan J. R., 1986, Machine Learning, V1, P81, DOI 10.1023/A:1022643204877
  • [10] Quinlan J. R., 1979, EXPERT SYSTEMS MICRO