Mining frequent itemsets without support threshold: With and without item constraints

被引:72
作者
Cheung, YL [1 ]
Fu, AWC [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci, Shatin, Hong Kong, Peoples R China
关键词
association rules; N-most interesting itemsets; FP-tree; item constraints;
D O I
10.1109/TKDE.2004.44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. In this paper, we handle a more practical problem; roughly speaking, it is to mine N k-itemsets with the highest supports for k up to a certain k(max) value. We call the results the N-most interesting itemsets. Generally, it is more straightforward for users to determine N and k(max). We propose two new algorithms, LOOPBACK and BOMO. Experiments show that our methods outperform the previously proposed Itemset-Loop algorithm, and the performance of BOMO can be an order of magnitude better than the original FP-tree algorithm, even with the assumption of an optimally chosen support threshold. We also propose the mining of "N-most interesting k-itemsets with item constraints." This allows user to specify different degrees of interestingness for different itemsets. Experiments show that our proposed Double FP-trees algorithm, which is based on BOMO, is highly efficient in solving this problem.
引用
收藏
页码:1052 / 1069
页数:18
相关论文
共 26 条
[1]  
AGGARWAL CC, 1998, B IEEE COMPUTER SOC, P23
[2]  
AGRAWAL R, 1998, P ACM SIGMOD C MAN D
[3]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[4]  
[Anonymous], P INT C VER LARG DAT
[5]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[6]  
Cheung Y.L., 2002, P SPIE C DAT MIN
[7]  
FU AWC, 2000, P INT S METH INT SYS
[8]  
GRAHNE G, 2000, P INT C DAT ENG ICDE
[9]  
Han J., 2000, P ACM SIGMOD INT C M, P1
[10]  
*IBM ALM RES CTR, SYNTH DAT GEN COD AS