From sequential pattern mining to structured pattern mining: A pattern-growth approach

被引:29
作者
Han, JW [1 ]
Pei, J
Yan, XF
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] SUNY Buffalo, Buffalo, NY 14260 USA
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
data mining; sequential pattern mining; structured pattern mining; scalability; performance analysis;
D O I
10.1007/BF02944897
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sequential pattern mining is an important data mining problem with broad applications. However, it-is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP, a horizontal format-based sequential pattern mining method, and (ii) SPADE, a vertical format-based method; and (2) a pattern-growth method, represented by PrefixSpan and its further extensions, such as gSpan for mining structured patterns. In this study, we perform a systematic introduction and presentation of the pattern-growth methodology and study its principles and extensions. We first introduce two interesting pattern-growth algorithms, FreeSpan and PrefixSpan, for efficient sequential pattern mining. Then we introduce gSpan for mining structured patterns using the same methodology. Their relative performance in large databases is presented and analyzed. Several extensions of these methods are also discussed in the paper, including mining multi-level, multi-dimensional patterns and mining constraint-based patterns.
引用
收藏
页码:257 / 279
页数:23
相关论文
共 36 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[3]  
[Anonymous], P ACM SIGMOD 98
[4]  
[Anonymous], 2000, P 6 ACM SIGKDD INT C
[5]  
ASAI T, 2002, P 2002 SIAM INT C DA
[6]   Constraint-based rule mining in large, dense databases [J].
Bayardo, RJ ;
Agrawal, R ;
Gunopulos, D .
15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, :188-197
[7]  
Bettini C., 1998, B TECH COMMITTEE DAT, V21, P32
[8]  
Beyer K, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P359, DOI 10.1145/304181.304214
[9]  
Cormen T. H., 2001, Introduction to Algorithms, V2nd
[10]  
Frasca, 2000, SIGKDD EXPLORATIONS, V2, P86, DOI [10.1145/380995.381033, DOI 10.1145/380995.381033]