Hyper-structure mining of frequent patterns in uncertain data streams

被引：11

作者：

HewaNadungodage, Chandima ^{[1
]}

Xia, Yuni ^{[1
]}

Lee, Jaehwan John ^{[2
]}

Tu, Yi-cheng ^{[3
]}

机构：

[1] Indiana Univ Purdue Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA

[2] Indiana Univ Purdue Univ, Dept Elect & Comp Engn, Indianapolis, IN 46202 USA

[3] Univ S Florida, Dept Comp Sci & Engn, Tampa, FL 33620 USA

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2013年 / 37卷 / 01期

关键词：

Data mining; Data stream; Data uncertainty; Frequent patterns; ITEMSETS; ALGORITHM;

D O I：

10.1007/s10115-012-0581-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data uncertainty is inherent in many real-world applications such as sensor monitoring systems, location-based services, and medical diagnostic systems. Moreover, many real-world applications are now capable of producing continuous, unbounded data streams. During the recent years, new methods have been developed to find frequent patterns in uncertain databases; nevertheless, very limited work has been done in discovering frequent patterns in uncertain data streams. The current solutions for frequent pattern mining in uncertain streams take a FP-tree-based approach; however, recent studies have shown that FP-tree-based algorithms do not perform well in the presence of data uncertainty. In this paper, we propose two hyper-structure-based false-positive-oriented algorithms to efficiently mine frequent itemsets from streams of uncertain data. The first algorithm, UHS-Stream, is designed to find all frequent itemsets up to the current moment. The second algorithm, TFUHS-Stream, is designed to find frequent itemsets in an uncertain data stream in a time-fading manner. Experimental results show that the proposed hyper-structure-based algorithms outperform the existing tree-based algorithms in terms of accuracy, runtime, and memory usage.

引用

页码：219 / 244

页数：26

共 30 条

[1]

Aggarwal CC, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P29

[2]

AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415

[3]

Agrawal R., P 20 INT C VERY LARG

[4]

[Anonymous], DATA MINING NEXT GEN

[5]

Bernecker T, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P119

[6]

Beyer K, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P359, DOI 10.1145/304181.304214

[7]

Brijs T., 1999, Proceedings of the fth ACM SIGKDD inter- national conference on Knowledge discovery and data mining, P254, DOI 10.1145/312129.312241

[8] A survey on algorithms for mining frequent itemsets over data streams [J].

Cheng, James ;

Ke, Yiping ;

Ng, Wilfred .

KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :1-27

[9]

Chui CK, 2008, LECT NOTES ARTIF INT, V5012, P64, DOI 10.1007/978-3-540-68125-0_8

[10]

Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47

← 1 2 3 →