Finding recently frequent itemsets adaptively over online transactional data streams

被引:30
作者
Chang, Joong Hyuk [1 ]
Lee, Won Suk [1 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul 120749, South Korea
关键词
recently frequent itemsets; data streams; information decay; decay rate; delayed insertion; itemset pruning; lexicographic tree;
D O I
10.1016/j.is.2005.04.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Consequently, the knowledge embedded in a data stream is more likely to be changed as time goes by. Identifying the recent change of a data stream, especially for an online data stream, can provide valuable information for the analysis of the data stream. However, most of mining algorithms or frequency approximation algorithms over a data stream do not differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present. Therefore, they are not able to extract the recent change of information in a data stream adaptively. This paper proposes a data mining method for finding recently frequent itemsets adaptively over an online transactional data stream. The effect of old transactions on the current mining result of a data steam is diminished by decaying the old occurrences of each itemset as time goes by. Furthermore, several optimization techniques are devised to minimize processing time as well as memory usage. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:849 / 869
页数:21
相关论文
共 26 条
  • [1] A tree projection algorithm for generation of frequent item sets
    Agarwal, RC
    Aggarwal, CC
    Prasad, VVV
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) : 350 - 371
  • [2] Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
  • [3] [Anonymous], P INT C VER LARG DAT
  • [4] Borders: An efficient algorithm for association generation in dynamic databases
    Aumann, Y
    Feldman, R
    Lipshtat, O
    Manilla, H
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 1999, 12 (01) : 61 - 73
  • [5] Babcock B, 2002, SIAM PROC S, P633
  • [6] Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
  • [7] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) : 3 - 15
  • [8] Maintenance of discovered association rules in large databases: Art incremental updating technique
    Cheung, DW
    Han, JW
    Ng, VT
    Wong, CY
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 106 - 114
  • [9] CHEUNG DW, 1997, P 5 INT C DAT SYST A, P185
  • [10] Maintaining stream statistics over sliding windows
    Datar, M
    Gionis, A
    Indyk, P
    Motwani, R
    [J]. SIAM JOURNAL ON COMPUTING, 2002, 31 (06) : 1794 - 1813