Moment: Maintaining closed frequent itemsets over a stream sliding window

被引:144
作者
Chi, Y [1 ]
Wang, HX [1 ]
Yu, PS [1 ]
Muntz, RR [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
来源
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDM.2004.10084
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding-window. The selected itemsets consist of a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET Our experiments show that our algorithm performs much better than previous approaches.
引用
收藏
页码:59 / 66
页数:8
相关论文
共 18 条
[1]
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]
A tree projection algorithm for generation of frequent item sets [J].
Agarwal, RC ;
Aggarwal, CC ;
Prasad, VVV .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) :350-371
[3]
ASAI T, 2002, P 2002 INT C DAT MIN
[4]
BAYARDO R, 1998, P ACM SIGMOD
[5]
CHANG JH, 2003, P 2003 INT C KNOWL D
[6]
CHARIKAR M, 2002, P 29 INT COLL AUT LA
[7]
CHEUNG D, 1996, P 12 INT C DAT ENG
[8]
CHEUNG DW, 1997, P 5 INT C DAT SYST A
[9]
CHI Y, 2004, CATCH MOMENT MAINTAI
[10]
Giannella C, 2003, TR587 IND U