AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset

被引:43
作者
Wang, Le [1 ,2 ]
Feng, Lin [1 ,2 ]
Wu, Mingfei [1 ,2 ]
机构
[1] Dalian Univ Technol, Fac Elect Informat & Elect Engn, Sch Comp Sci & Technol, Dalian 116024, Liaoning, Peoples R China
[2] Dalian Univ Technol, Sch Innovat & Expt, Dalian 116024, Liaoning, Peoples R China
关键词
data mining; frequent itemset; frequent pattern; uncertain dataset;
D O I
10.4304/jcp.8.6.1417-1426
中图分类号
TP39 [计算机的应用];
学科分类号
081203 [计算机应用技术]; 0835 [软件工程];
摘要
Frequent itemset/pattern mining (FIM) over uncertain transaction dataset is a fundamental task in data mining. In this paper, we study the problem of FIM over uncertain datasets. There are two main approaches for FIM: the level-wise approach and the pattern-growth approach. The level-wise approach requires multiple scans of dataset and generates candidate itemsets. The pattern-growth approach requires a large amount of memory and computation time to process tree nodes because the current algorithms for uncertain datasets cannot create a tree as compact as the original FP-Tree. In this paper, we propose an array based tail node tree structure (namely AT-Tree) to maintain transaction itemsets, and a pattern-growth based algorithm named AT-Mine for FIM over uncertain dataset. AT-Tree is created by two scans of dataset and it is as compact as the original FP-Tree. AT-Mine mines frequent itemsets from AT-Tree without additional scan of dataset. We evaluate our algorithm using sparse and dense datasets; the experimental results show that our algorithm has achieved better performance than the state-of-the-art FIM algorithms on uncertain transaction datasets, especially for small minimum expected support number.
引用
收藏
页码:1417 / 1426
页数:10
相关论文
共 33 条
[1]
Agarwal R.C., 2001, ACM SIGKDD INT C KNO, P108
[2]
Aggarwal CC, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P29
[3]
A Survey of Uncertain Data Algorithms and Applications [J].
Aggarwal, Charu C. ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :609-623
[4]
Agrawal R., 1994, 20 VLDB C, P487, DOI DOI 10.1007/BF02948845
[5]
Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[6]
DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets [J].
Bay Vo ;
Hong, Tzung-Pei ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7196-7206
[7]
Bayardo R. J. Jr., 1998, SIGMOD Record, V27, P85, DOI 10.1145/276305.276313
[8]
MAFIA: A maximal frequent itemset algorithm [J].
Burdick, D ;
Calimlim, M ;
Flannick, J ;
Gehrke, J ;
Yiu, TM .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) :1490-1504
[9]
Calders Toon, 2010, Proceedings 2010 10th IEEE International Conference on Data Mining (ICDM 2010), P749, DOI 10.1109/ICDM.2010.42
[10]
Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47