Mining sequential patterns using graph search techniques

被引:15
作者
Huang, YF
Lin, SY
机构
来源
27TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, PROCEEDINGS | 2003年
关键词
D O I
10.1109/CMPSAC.2003.1245314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sequential patterns discovery has emerged as an important problem in data mining. In this paper we propose an effective GST algorithm for mining sequential patterns in a large transaction database. Different from the Apriori-like algorithms, the GST algorithm can out of order find large k-sequences (k > = 3); i.e., we can find large k-sequences not directly through large (k-1)-sequences. This leads to that our algorithm has much better performance than the Apriori-like algorithms. Besides, we also propose the method to find new sequential patterns by scanning only new transactions since the database was increased. Through several comprehensive experiments, the GST algorithm gains a significant performance improvement over the Apriori-like algorithms. Also we found as long as the ratio of the items purchased in new transactions is not close to 100%, scanning only new transactions is always much better than scanning the entire database.
引用
收藏
页码:4 / 9
页数:6
相关论文
共 12 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, V1215, P487
[3]   Efficient data mining for path traversal patterns [J].
Chen, MS ;
Park, JS ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) :209-221
[4]   Grouping Web page references into transactions for Mining World Wide Web browsing patterns [J].
Cooley, R ;
Mobasher, B ;
Srivastava, J .
1997 IEEE KNOWLEDGE AND DATA ENGINEERING EXCHANGE WORKSHOP, PROCEEDINGS, 1997, :2-9
[5]   Binary partition based algorithms for mining association rules [J].
Feng, JL ;
Feng, YC .
IEEE INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES -ADL'98-, PROCEEDINGS, 1998, :30-34
[6]  
HAN J, 2000, P 2000 ACM SIGMOD IN, P1, DOI DOI 10.1145/342009.335372
[7]  
HONG TP, 1999, P IEEE INT C SYST MA, P962
[8]  
Park JS, 1997, IEEE T KNOWL DATA EN, V9, P813, DOI 10.1109/69.634757
[9]  
SRIKANT R, 2001, P ACM WORLD WID WEB, V10, P430
[10]   An effective Boolean algorithm for mining association rules in large databases [J].
Wur, SY ;
Leu, Y .
6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, :179-186