Classification trees with unbiased multiway splits

被引:243
作者
Kim, H [1 ]
Loh, WY
机构
[1] Worcester Polytech Inst, Dept Math Sci, Worcester, MA 01609 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
关键词
decision tree; linear discriminant analysis; missing value; selection bias;
D O I
10.1198/016214501753168271
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the number of splits they offer and when they differ in the number of missing values. This is an advantage because inferences from the tree structures can be adversely affected by selection bias. The new methods are shown to be highly competitive in terms of computational speed and classification accuracy of future observations.
引用
收藏
页码:589 / 604
页数:16
相关论文
共 37 条