Improved use of continuous attributes in C4.5

被引:1090
作者
Quinlan, JR
机构
[1] Basser Dept. of Computer Science, University of Sydney
关键词
D O I
10.1613/jair.279
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modifications lead to smaller decision trees with higher predictive accuracies. Results also confirm that a new version of C4.5 incorporating these changes is superior to recent approaches that use global discretization and that construct small trees with multi-interval splits.
引用
收藏
页码:77 / 90
页数:14
相关论文
共 14 条
  • [1] [Anonymous], 1993, C4 5 PROGRAMS MACH L
  • [2] [Anonymous], 1993, P 13 INT JOINT C ART, DOI DOI 10.1109/TKDE.2011.181
  • [3] AUER P, 1995, P 12 INT C MACH LEAR, P21
  • [4] BREIMAN L, 1996, IN PRESS MACH LEARNI
  • [5] CATLETT J, 1991, LECT NOTES ARTIF INT, V482, P164, DOI 10.1007/BFb0017012
  • [6] Dougherty J., 1995, MACHINE LEARNING P 1, P194, DOI [10.1016/B978-1-55860-377-6.50032-3, DOI 10.1016/B978-1-55860-377-6.50032-3]
  • [7] ON THE HANDLING OF CONTINUOUS-VALUED ATTRIBUTES IN DECISION TREE GENERATION
    FAYYAD, UM
    IRANI, KB
    [J]. MACHINE LEARNING, 1992, 8 (01) : 87 - 102
  • [8] FREUND Y, 1996, UNPUB DECISION THEOR
  • [9] VERY SIMPLE CLASSIFICATION RULES PERFORM WELL ON MOST COMMONLY USED DATASETS
    HOLTE, RC
    [J]. MACHINE LEARNING, 1993, 11 (01) : 63 - 91
  • [10] Hunt E. B., 1966, EXPT INDUCTION