DATABASE MINING - A PERFORMANCE PERSPECTIVE

被引:881
作者
AGRAWAL, R
IMIELINSKI, T
SWAMI, A
机构
[1] IBM Almaden Research Center, 650 Harry Road, San Jose
[2] the Computer Science Department, Rutgers University, New Brunswick
关键词
ASSOCIATIONS; CLASSIFICATION; DATABASE MINING; DECISION TREES; KNOWLEDGE DISCOVERY; SEQUENCES;
D O I
10.1109/69.250074
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. We show how the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm not only is efficient in discovering classification rules but also has accuracy comparable to ID3, one of the current best classifiers.
引用
收藏
页码:914 / 925
页数:12
相关论文
共 20 条
[1]
AGRAWAL R, 1992, VERY LARGE DATABASE, P560
[2]
ANWAR TM, 1992, 8TH IEEE INT C DAT E
[3]
BRICE R, 1990, 23RD HAW INT C SYST
[4]
Brieman L, 1984, CLASSIFICATION REGRE
[5]
BUNTINE W, 1991, COLLECTED NOTES WORK
[6]
BUNTINE W, 1991, IND TREE PACKAGE
[7]
CHOU PA, 1988, THESIS STANFORD U CA
[8]
DATTATREYA R, 1985, PROGR PATTERN RECOGN, V2
[9]
HAN J, 1992, VLDB, P547
[10]
KRISHNAMURTHY R, 1991, SIGMOD REC, V20, P76