DECISION TREE DESIGN FROM A COMMUNICATION-THEORY STANDPOINT

被引:32
作者
GOODMAN, RM
SMYTH, P
机构
[1] California Inst of Technology,, Pasadena, CA, USA
关键词
Computer Programming--Analysis - Decision Theory and Analysis;
D O I
10.1109/18.21221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A communication theory approach to decision-tree-design based on a top-town mutual information algorithm is presented. It is shown that this algorithm is equivalent to a form of Shannon-Fano prefix coding, and several fundamental bounds relating decision-tree parameters are derived. The bounds are used in conjuction with a rate-distortion interpretation of tree design to explain several phenomena previously observed in practical decision-tree design. A termination rule for the algorithm called the delta-entropy rule is proposed that improves its robustness in the presence of noise. Simulation results are presented, showing that the tree classifiers derived by the algorithm compare favorably to the single-nearest-neighbor classifier.
引用
收藏
页码:979 / 994
页数:16
相关论文
共 31 条