BitTableFI: An efficient mining frequent itemsets algorithm

被引：87

作者：

Dong, Jie ^{[1
]}

Han, Min ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Elect & Informat Engn, Dalian 116023, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2007年 / 20卷 / 04期

关键词：

data mining; frequent itemsets; BitTable; database compressing;

D O I：

10.1016/j.knosys.2006.08.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mining frequent itemsets in transaction databases, time-series databases and many other kinds of databases is an important task and has been studied popularly in data mining research. The problem of mining frequent itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying those itemsets that meet the frequent itemset requirement within this candidate set. Most of the previous research mainly focuses on pruning to reduce the candidate itemsets amounts and the times of scanning databases. However, many algorithms adopt an Apriori-like candidate itemsets generation and support count approach that is the most time-wasted process. To address this issue, the paper proposes an effective algorithm named as BitTableFI. In the algorithm, a special data structure BitTable is used horizontally and vertically to compress database for quick candidate itemsets generation and support count, respectively. The algorithm can also be used in many Apriori-like algorithms to improve the performance. Experiments with both synthetic and real databases show that BitTableFI outperforms Apriori and CBAR which uses ClusterTable for quick support count. (c) 2006 Elsevier B.V. All rights reserved.

引用

页码：329 / 335

页数：7

共 16 条

[1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2] Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
[3] Ashrafi MZ, 2003, LECT NOTES COMPUT SC, V2660, P978
[4] Efficient breadth-first mining of frequent pattern with monotone constraints
Bonchi, F
Giannotti, F
Mazzanti, A
Pedreschi, D
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (02) : 131 - 153
[5] MAFIA: A maximal frequent itemset algorithm
Burdick, D
Calimlim, M
Flannick, J
Gehrke, J
Yiu, TM
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1490 - 1504
[6] GenMax: An efficient algorithm for mining maximal frequent itemsets
Gouda, K
Zaki, MJ
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) : 223 - 242
[7] Algorithms for computing association rules using a partial-support tree
Goulbourne, G
Coenen, F
Leng, P
[J]. KNOWLEDGE-BASED SYSTEMS, 2000, 13 (2-3) : 141 - 149
[8] Fast algorithms for frequent itemset mining using FP-trees
Grahne, G
Zhu, JF
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362
[9] Mining frequent patterns without candidate generation: A frequent-pattern tree approach
Han, JW
Pei, J
Yin, YW
Mao, RY
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) : 53 - 87
[10] Mining association rules using inverted hashing and pruning
Holt, JD
Chung, SM
[J]. INFORMATION PROCESSING LETTERS, 2002, 83 (04) : 211 - 220

← 1 2 →