Efficient algorithms for mining maximal high utility itemsets from data streams with different models

被引:71
作者
Shie, Bai-En [1 ]
Yu, Philip S. [2 ]
Tseng, Vincent S. [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[2] Univ Illinois, Dept Comp Sci, Chicago, IL USA
关键词
High utility itemset; Maximal pattern; Utility mining; Data stream mining; FREQUENT CLOSED ITEMSETS;
D O I
10.1016/j.eswa.2012.05.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data stream mining is an emerging research topic in the data mining field. Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis. However, two main problems existed in relevant studies: (1) The utilities (e.g., importance or profits) of items are not considered. Actual utilities of patterns cannot be reflected in frequent itemsets. (2) Existing utility mining methods produce too many patterns and this makes it difficult for the users to filter useful patterns among the huge set of patterns. In view of this, in this paper we propose a novel framework, named GUIDE (Generation of maximal high Utility Itemsets from Data strEams), to find maximal high utility itemsets from data streams with different models. i.e., landmark, sliding window and time fading models. The proposed structure, named MUI-Tree (Maximal high Utility Itemset Tree), maintains essential information for the mining processes and the proposed strategies further facilitates the performance of GUIDE. Main contributions of this paper are as follows: (1) To the best of our knowledge, this is the first work on mining the compact form of high utility patterns from data streams; (2) GUIDE is an effective one-pass framework which meets the requirements of data stream mining; (3) GUIDE generates novel patterns which are not only high utility but also maximal, which provide compact and insightful hidden information in the data streams. Experimental results show that our approach outperforms the state-of-the-art algorithms under various conditions in data stream environments on different models. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12947 / 12960
页数:14
相关论文
共 31 条
[1]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[2]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[3]  
Bifet Albert., 2011, P 17 ACM SIGKDD INT, P591, DOI DOI 10.1145/2020408.2020501
[4]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[5]   Maintaining frequent closed itemsets over a sliding window [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2008, 31 (03) :191-215
[6]   A survey on algorithms for mining frequent itemsets over data streams [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27
[7]   Moment: Maintaining closed frequent itemsets over a stream sliding window [J].
Chi, Y ;
Wang, HX ;
Yu, PS ;
Muntz, RR .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :59-66
[8]  
Gaber MM, 2005, SIGMOD REC, V34, P18, DOI 10.1145/1083784.1083789
[9]  
Giannella C., 2003, Next Gener. Data Min, V212, P191
[10]  
Golab L, 2003, SIGMOD REC, V32, P5, DOI 10.1145/776985.776986