RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

被引:7
作者
Johannes Gehrke
Raghu Ramakrishnan
Venkatesh Ganti
机构
[1] University of Wisconsin-,Department of Computer Sciences
来源
Data Mining and Knowledge Discovery | 2000年 / 4卷
关键词
data mining; decision trees; classification; scalability;
D O I
暂无
中图分类号
学科分类号
摘要
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework called Rain Forest for classification tree construction that separates the scalability aspects of algorithms for constructing a tree from the central features that determine the quality of the tree. The generic algorithm is easy to instantiate with specific split selection methods from the literature (including C4.5, CART, CHAID, FACT, ID3 and extensions, SLIQ, SPRINT and QUEST).
引用
收藏
页码:127 / 162
页数:35
相关论文
共 35 条
[1]  
Agrawal R.(1993)Database mining: A performance perspective IEEE Transactions on Knowledge and Data Engineering 5 914-925
[2]  
Imielinski T.(1987)Approximating the number of unique values of an attribute without sorting Information Systems 12 11-15
[3]  
Swami A.(1996)Mining business databases Communications of the ACM 39 42-48
[4]  
Astrahan M.M.(1991)On changing continuos attributes into ordered discrete attributes Proceedings of the European Working Session on Learning: Machine Learning 482 164-178
[5]  
Schkolnick M.(1993)A comparison of decision classifiers with backpropagation neural networks for multimodal classification problems Pattern Recognition 26 953-961
[6]  
Whang K.-Y.(1994)Neural networks, decision tree induction and discriminant analysis: An empirical comparison Journal of the Operational Research Society 45 440-450
[7]  
Brachman R.J.(1977)A recursive partitioning decision rule for nonparametric classifiers IEEE Transactions on Computers 26 404-408
[8]  
Khabaza T.(1972)MAID: A honeywell 600 program for an automatised survey analysis Behavioral Science 17 251-252
[9]  
Kloesgen W.(1976)Constructing optimal binary decision trees is NP-complete Information Processing Letters 5 15-17
[10]  
Shapiro G.P.(1975)Fast approximation algorithms for the knapsack and sum of subsets problem Journal of the ACM 22 463-468