Efficient mining of frequent episodes from complex sequences

被引:65
作者
Huang, Kuo-Yu [1 ]
Chang, Chia-Hui [1 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Chungli 320, Taiwan
关键词
data mining; frequent episodes; temporal association;
D O I
10.1016/j.is.2007.07.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Discovering patterns with great significance is an important problem in data mining discipline. An episode is defined to be a partially ordered set of events for consecutive and fixed-time intervals in a sequence. Most of previous studies on episodes consider only frequent episodes in a sequence of events (called simple sequence). In real world, we may find a set of events at each time slot in terms of various intervals (hours, days, weeks, etc.). We refer to such sequences as complex sequences. Mining frequent episodes in complex sequences has more extensive applications than that in simple sequences. In this paper, we discuss the problem on mining frequent episodes in a complex sequence. We extend previous algorithm MINEPI to MINEPI+ for episode mining from complex sequences. Furthermore, a memory-anchored algorithm called EMMA is introduced for the mining task. Experimental evaluation on both real-world and synthetic data sets shows that EMMA is more efficient than MINEPI+. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:96 / 114
页数:19
相关论文
共 34 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
[Anonymous], 1996, EDBT, DOI 10.1007/BFb0014140
[3]  
[Anonymous], 2000, P 6 ACM SIGKDD INT C
[4]  
[Anonymous], P 7 PAC AS C KNOWL D
[5]  
[Anonymous], 1995, P 1 SIGKDD INT C KNO
[6]  
ATALLAH M, 2004, P 3 IEEE INT C DAT M
[7]  
Ayres J., 2002, Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining, P429, DOI 10.1145/775047.775109
[8]   Mining sequential patterns with regular expression constraints [J].
Garofalakis, M ;
Rastogi, R ;
Shim, K .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (03) :530-552
[9]  
GARRIGA GC, 2003, P 7 EUR C PRINC PRAC
[10]  
GWADERA R, 2003, P 3 IEEE INT C DAT M