A survey of methods for scaling up inductive algorithms

被引:164
作者
Provost, F
Kolluri, V
机构
[1] Bell Atlantic Sci & Technol, White Plains, NY 10604 USA
[2] Univ Pittsburgh, Dept Informat Sci, Pittsburgh, PA 15260 USA
[3] Lycos Inc, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
scaling up; inductive learning; decision trees; rule learning;
D O I
10.1023/A:1009876119989
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule sets, in order to provide focus and specific details; the issues and techniques generalize to other types of data mining. We begin with a discussion of important issues related to scaling up. We highlight similarities among scaling techniques by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent techniques, drawing on specific examples from published papers. Finally, we use the preceding analysis to suggest how to proceed when dealing with a large problem, and where to focus future research.
引用
收藏
页码:131 / 169
页数:39
相关论文
共 147 条
[1]
AGRAWAL R, 1996, P 2 INT C KNOWL DISC, P287
[2]
AGRAWAL R, 1995, 1000589094 RJ IBM CO
[3]
AGRAWAL R, 1994, RJ9839 IBM CORP
[4]
Error reduction through learning multiple descriptions [J].
Ali, KM ;
Pazzani, MJ .
MACHINE LEARNING, 1996, 24 (03) :173-202
[5]
Almuallim Hussein, 1995, P 12 INT C MACH LEAR
[6]
ANDERSEN W, 1994, MASSIVELY PARALLEL A
[7]
[Anonymous], MINING VERY LARGE DA
[8]
[Anonymous], P 11 INT JOINT C ART
[9]
[Anonymous], 1982, Pattern recognition: A statistical approach
[10]
[Anonymous], 1980, CBMTR117 RUTG U