SPADE: An efficient algorithm for mining frequent sequences

被引:1703
作者
Zaki, MJ [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
关键词
sequence mining; sequential patterns; frequent patterns; data mining; knowledge discovery;
D O I
10.1023/A:1007652502315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans, and use complex hash structures which have poor locality. SPADE utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved in main-memory using efficient lattice search techniques, and using simple join operations. All sequences are discovered in only three database scans. Experiments show that SPADE outperforms the best previous algorithm by a factor of two, and by an order of magnitude with some pre-processed data. It also has linear scalability with respect to the number of input-sequences, and a number of other database parameters. Finally, we discuss how the results of sequence mining can be applied in a real application domain.
引用
收藏
页码:31 / 60
页数:30
相关论文
共 15 条
[1]
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[2]
[Anonymous], 1997, P 3 KDD
[3]
[Anonymous], 1995, 11 INT C DAT ENG
[4]
Davey B., 1990, INTRO LATTICES ORDER
[5]
FERGUSON G, 1998, 15 NAT C ART INT
[6]
HATONEN K, 1996, 12 INT C DAT ENG
[7]
LESH N, 1998, 15 NAT C ART INT
[8]
MANNILA H, 1995, P 1 INT C KNOWL DISC
[9]
MANNILA H, 1996, 2 INT C KNOWL DISC D
[10]
OATES T, 1997, 6 INT WORKSH AI STAT