Scalable parallel data mining for association rules

被引:74
作者
Han, EH [1 ]
Karypis, G
Kumar, V
机构
[1] Univ Minnesota, USA, HPC Res Ctr, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
data mining; parallel processing; association rules; load balance; scalability;
D O I
10.1109/69.846289
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules. These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD is a hybrid algorithm that combines the advantages of Co and DD. Experimental results on a 128-processor Gray T3E show that HD scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing candidate set size.
引用
收藏
页码:337 / 352
页数:16
相关论文
共 16 条
[1]   Parallel mining of association rules [J].
Agrawal, R ;
Shafer, JC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :962-969
[2]  
AGRAWAL R, 1993, P 1993 ACM SIGMOD IN
[3]  
Agrawal R., 1994, P 20 INT C VER LARG, V1215, P487
[4]  
[Anonymous], P INT C VER LARG DAT
[5]   Efficient mining of association rules in distributed databases [J].
Cheung, DW ;
Ng, VT ;
Fu, AW ;
Fu, YJ .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) :911-922
[6]  
HAN EH, 1997, P 1997 ACM SIGMOD IN
[7]  
HOUTSMA M, 1995, PROC INT CONF DATA, P25, DOI 10.1109/ICDE.1995.380413
[8]  
Kumar V., 1994, INTRO PARALLEL COMPU, V400
[9]  
Ogihara ZP., 1997, 3 INT C KNOWL DISC D
[10]  
Papadimitriou Christos H., 1981, Combinatorial Optimization: Algorithms and Complexity