Hyper-structure mining of frequent patterns in uncertain data streams

被引:11
作者
HewaNadungodage, Chandima [1 ]
Xia, Yuni [1 ]
Lee, Jaehwan John [2 ]
Tu, Yi-cheng [3 ]
机构
[1] Indiana Univ Purdue Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA
[2] Indiana Univ Purdue Univ, Dept Elect & Comp Engn, Indianapolis, IN 46202 USA
[3] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA
关键词
Data mining; Data stream; Data uncertainty; Frequent patterns; ITEMSETS; ALGORITHM;
D O I
10.1007/s10115-012-0581-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data uncertainty is inherent in many real-world applications such as sensor monitoring systems, location-based services, and medical diagnostic systems. Moreover, many real-world applications are now capable of producing continuous, unbounded data streams. During the recent years, new methods have been developed to find frequent patterns in uncertain databases; nevertheless, very limited work has been done in discovering frequent patterns in uncertain data streams. The current solutions for frequent pattern mining in uncertain streams take a FP-tree-based approach; however, recent studies have shown that FP-tree-based algorithms do not perform well in the presence of data uncertainty. In this paper, we propose two hyper-structure-based false-positive-oriented algorithms to efficiently mine frequent itemsets from streams of uncertain data. The first algorithm, UHS-Stream, is designed to find all frequent itemsets up to the current moment. The second algorithm, TFUHS-Stream, is designed to find frequent itemsets in an uncertain data stream in a time-fading manner. Experimental results show that the proposed hyper-structure-based algorithms outperform the existing tree-based algorithms in terms of accuracy, runtime, and memory usage.
引用
收藏
页码:219 / 244
页数:26
相关论文
共 30 条
[1]  
Aggarwal CC, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P29
[2]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[3]  
Agrawal R., P 20 INT C VERY LARG
[4]  
[Anonymous], DATA MINING NEXT GEN
[5]  
Bernecker T, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P119
[6]  
Beyer K, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P359, DOI 10.1145/304181.304214
[7]  
Brijs T., 1999, Proceedings of the fth ACM SIGKDD inter- national conference on Knowledge discovery and data mining, P254, DOI 10.1145/312129.312241
[8]   A survey on algorithms for mining frequent itemsets over data streams [J].
Cheng, James ;
Ke, Yiping ;
Ng, Wilfred .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27
[9]  
Chui CK, 2008, LECT NOTES ARTIF INT, V5012, P64, DOI 10.1007/978-3-540-68125-0_8
[10]  
Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47