Constraint-based rule mining in large, dense databases

被引:126
作者
Bayardo, RJ [1 ]
Agrawal, R [1 ]
Gunopulos, D [1 ]
机构
[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA
关键词
data mining; association rules; rule induction;
D O I
10.1023/A:1009895914772
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Constraint-based rule miners find all rules in a given data-set meeting user-specified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits all user-specified constraints including minimum support, minimum confidence, and a new constraint that ensures every mined rule offers a predictive advantage over any of its simplifications. Our algorithm maintains efficiency even at low supports on data that is dense (e.g. relational tables). Previous approaches such as Apriori and its variants exploit only the minimum support constraint, and as a result are ineffective on dense data due to a combinatorial explosion of "frequent itemsets".
引用
收藏
页码:217 / 240
页数:24
相关论文
共 31 条
[1]  
AGARWAL R, 1998, RC21341 IBM
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[4]  
AGRAWAL R, 1994, RJ9839 IBM ALM RES C
[5]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[6]  
[Anonymous], P 1998 ACM SIGMOD IN
[7]  
[Anonymous], P ACM SIGMOD 98
[8]  
[Anonymous], P INT C VER LARG DAT
[9]  
Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]
[10]  
Bayardo R. J. Jr., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P123