Split selection methods for classification trees

被引:22
作者
Loh, WY
Shih, YS
机构
[1] UNIV WISCONSIN,DEPT STAT,MADISON,WI 53706
[2] NATL CHUNG CHENG UNIV,DEPT MATH,CHIAYI 621,TAIWAN
关键词
decision trees; discriminant analysis; machine learning;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.
引用
收藏
页码:815 / 840
页数:26
相关论文
共 22 条
[1]   TREE-STRUCTURED PROPORTIONAL HAZARDS REGRESSION MODELING [J].
AHN, HS ;
LOH, WY .
BIOMETRICS, 1994, 50 (02) :471-485
[2]  
[Anonymous], 1979, Multivariate analysis
[3]  
Breiman L., 1984, Classification and Regression Trees, DOI DOI 10.2307/2530946
[4]  
Breiman L., 1996, Bias, variance, and arcing classifiers
[5]  
BREIMAN L, 1996, IN PRESS MACHNE LEAR
[6]  
CHAUDHURI P, 1994, STAT SINICA, V4, P143
[7]  
CHAUDHURI P, 1995, STAT SINICA, V5, P641
[8]   OPTIMAL PARTITIONING FOR CLASSIFICATION AND REGRESSION TREES [J].
CHOU, PA .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (04) :340-354
[9]   USE OF AUTOMATIC INTERACTION DETECTOR AND SIMILAR SEARCH PROCEDURES [J].
DOYLE, P .
OPERATIONAL RESEARCH QUARTERLY, 1973, 24 (03) :465-467
[10]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188