Scalable algorithms for association mining

被引:956
作者
Zaki, MJ [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
关键词
association rules; frequent itemsets; equivalence classes; maximal cliques; lattices; data mining;
D O I
10.1109/69.846291
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets and then, forming conditional implication rules among them. In this paper. we present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases.
引用
收藏
页码:372 / 390
页数:19
相关论文
共 31 条
[1]   Parallel mining of association rules [J].
Agrawal, R ;
Shafer, JC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :962-969
[2]  
Agrawal R., 1993, ACM SIGMOD C MAN DAT
[3]  
Agrawal R., 1994, P 20 VER LARG DAT BA
[4]  
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[5]  
[Anonymous], 1997, P 3 KDD
[6]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[7]  
[Anonymous], ACM SIGMOD C MAN DAT
[8]  
BRIN S, 1997, ACM SIGMOD C MAN DAT
[9]  
CHEUNG D, 1996, 4 INT C PAR DISTR IN
[10]  
Davey B., 1990, INTRO LATTICES ORDER