Hyperclique pattern discovery

被引:114
作者
Xiong, Hui
Tan, Pang-Ning
Kumar, Vipin
机构
[1] Rutgers State Univ, Dept Management Sci & Informat Syst, Newark, NJ 07102 USA
[2] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[3] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
association analysis; hyperclique patterns; pattern Mining;
D O I
10.1007/s10618-006-0043-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of support. Also, it tends to generate too many spurious patterns involving items which are from different support levels and are poorly correlated. In this paper, we present a framework for mining highly-correlated association patterns called hyperclique patterns. In this framework, an objective measure called h-confidence is applied to discover hyperclique patterns. We prove that the items in a hyperclique pattern have a guaranteed level of global pairwise similarity to one another as measured by the cosine similarity (uncentered Pearson's correlation coefficient). Also, we show that the h-confidence measure satisfies a cross-support property which can help efficiently eliminate spurious patterns involving items with substantially different support levels. Indeed, this cross-support property is not limited to h-confidence and can be generalized to some other association measures. In addition, an algorithm called hyperclique miner is proposed to exploit both cross-support and anti-monotone properties of the h-confidence measure for the efficient discovery of hyperclique patterns. Finally, our experimental results show that hyperclique miner can efficiently identify hyperclique patterns, even at extremely low levels of support.
引用
收藏
页码:219 / 242
页数:24
相关论文
共 26 条
[1]  
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]  
Agarwal R., 2000, J PARALLEL DISTRIBUT
[3]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[4]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[5]  
Bayardo Jr R.J., 1998, P 1998 ACM SIGMOD IN
[6]  
BAYARDO R, 1999, P INT C DAT ENG
[7]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265, DOI DOI 10.1145/253262.253327
[8]  
BURDICK D, 2001, P 2001 INT C DAT ENG
[9]  
Cohen E., 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073), P489, DOI 10.1109/ICDE.2000.839448
[10]   CLIQUE PARTITIONS, GRAPH COMPRESSION AND SPEEDING-UP ALGORITHMS [J].
FEDER, T ;
MOTWANI, R .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1995, 51 (02) :261-272