Mining frequent tree-like patterns in large datasets

被引:15
作者
Chen, Tzung-Shi [1 ]
Hsu, Shih-Chun [1 ]
机构
[1] Natl Univ Tainan, Dept Informat & Learning Technol, Tainan 700, Taiwan
关键词
data mining; frequent patterns; sequential patterns; tree-like patterns; world wide web;
D O I
10.1016/j.datak.2006.07.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequential pattern mining is crucial to data mining domains. This paper proposes a novel data mining approach for exploring hierarchical tree structures, named tree-like patterns, representing the relationships for a pair of items in a sequence. Using tree-like patterns, the relationships for a pair of items can be identified in terms of the cause and effect. A novel technique that efficiently counts support values for tree-like patterns using a queue structure is proposed. In addition, this paper addresses an efficient scheme for determining the frequency of a tree-like pattern in a sequence using a dynamic programming approach. Each tree-like pattern embedded in a sequence is considered to have a certain valuable meaning or the degree of importance used in different applications. Two addressed formulas are applied to determine the degree of significance for a specific sequence, which denotes the degree of consecutive items in a tree-like pattern for a sequence. The larger the degree of significance a tree-like pattern has, the more the tree-like pattern is compacted in the sequence. The characteristics differentiating the explored patterns from those obtained with other schemes are discussed. A simulation analysis of the proposed data mining approach is utilized to demonstrate its efficacy. Finally, the proposed approach is designed and implemented in a data mining system integrated into a novel e-learning platform. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 83
页数:19
相关论文
共 24 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
[3]  
[Anonymous], P 9 ACM SIGKDD INT C
[4]   Efficient data mining for path traversal patterns [J].
Chen, MS ;
Park, JS ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) :209-221
[5]  
Chen TS, 2003, IEICE T INF SYST, VE86D, P2722
[6]  
CHENG H, 2004, P 2004 INT C KNOWL D
[7]  
ELHAJJ M, 2003, P ICDM 2003 WORKSH F
[8]   Mining interesting knowledge from weblogs: a survey [J].
Facca, FM ;
Lanzi, PL .
DATA & KNOWLEDGE ENGINEERING, 2005, 53 (03) :225-241
[9]  
Huang X., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P617, DOI 10.1145/584792.584896
[10]   Mining sequential patterns using graph search techniques [J].
Huang, YF ;
Lin, SY .
27TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2003, :4-9