Beyond market baskets: Generalizing association rules to dependence rules

被引:251
作者
Silverstein, C [1 ]
Brin, S [1 ]
Motwani, R [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
data mining; market basket; association rules; dependence rules; closure properties; text mining;
D O I
10.1023/A:1009713703947
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: "A customer purchasing item A often also purchases item B." Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm's effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
引用
收藏
页码:39 / 68
页数:30
相关论文
共 27 条
[1]
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]
DATABASE MINING - A PERFORMANCE PERSPECTIVE [J].
AGRAWAL, R ;
IMIELINSKI, T ;
SWAMI, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) :914-925
[3]
AGRAWAL R, 1996, P ADV KNOWL DISC DAT, P307
[4]
AGRAWAL R, 1996, P 2 INT C KNOWL DISC
[5]
Agresti A., 1992, STAT SCI, V7, P131, DOI DOI 10.1214/SS/1177011454
[6]
[Anonymous], P 1996 ACM SIGMOD IN
[7]
[Anonymous], P PYOC ACM SIGMOD IN
[8]
[Anonymous], P INT C VER LARG DAT
[9]
DELAPLACE PS, 1878, OEUVRES COMPLETES LA
[10]
DEMOIAVRE A, 1933, MISCELLANEA ANAL S