A survey of itemset mining

被引:208
作者
Fournier-Viger, Philippe [1 ]
Lin, Jerry Chun-Wei [2 ]
Bay Vo [3 ,4 ]
Tin Truong Chi [5 ]
Zhang, Ji [6 ]
Hoai Bac Le [3 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Sch Nat Sci & Humanities, Shenzhen, Peoples R China
[2] Harbin Inst Technol, Shenzhen Grad Sch, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[3] Ho Chi Minh City Univ Technol, Fac Informat Technol, Ho Chi Minh City, Vietnam
[4] Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea
[5] Univ DaLat, Dept Math & Informat, Da Lat, Vietnam
[6] Univ Southern Queensland, Fac Hlth Engn & Sci, Toowoomba, Qld, Australia
基金
美国国家科学基金会;
关键词
FREQUENT ITEMSETS; ALGORITHM; PATTERNS; TREE; CONSTRAINTS;
D O I
10.1002/widm.1207
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently cooccurring in databases. Because of its numerous applications in domains such as bioinformatics, text mining, product recommendation, e-learning, and web click stream analysis, itemset mining has become a popular research area. This study provides an up-to-date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics are provided. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high-utility itemset mining, rare itemset mining, fuzzy itemset mining, and uncertain itemset mining. This study also discusses research opportunities and the relationship to other popular pattern mining problems, such as sequential pattern mining, episode mining, subgraph mining, and association rule mining. Main open-source libraries of itemset mining implementations are also briefly presented. (C) 2017 John Wiley & Sons, Ltd
引用
收藏
页数:18
相关论文
共 115 条
[1]
Agrawal R., P 20 INT C VERY LARG
[2]
EXPEDITE: EXPress closED ITemset Enumeration [J].
Aliberti, Giulio ;
Colantonio, Alessandro ;
Di Pietro, Roberto ;
Mariani, Riccardo .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (08) :3933-3944
[3]
[Anonymous], 2015, Data mining: the textbook
[4]
Antonie L., 2016, P 31 ANN ACM S APPL, P867
[5]
Ayres J., 2002, P ACM SIGKDD INT C K, P429
[6]
The minimum description length principle in coding and modeling [J].
Barron, A ;
Rissanen, J ;
Yu, B .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) :2743-2760
[7]
Mining Flipping Correlations from Large Datasets with Taxonomies [J].
Barsky, Marina ;
Kim, Sangkyum ;
Weninger, Tim ;
Han, Jiawei .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (04) :370-381
[8]
DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets [J].
Bay Vo ;
Hong, Tzung-Pei ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7196-7206
[9]
Bernecker T, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P119
[10]
An Iterative MapReduce Based Frequent Subgraph Mining Algorithm [J].
Bhuiyan, Mansurul A. ;
Al Hasan, Mohammad .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) :608-620