An integration of fuzzy association rules and WordNet for document clustering

被引:17
作者
Chen, Chun-Ling [1 ]
Tseng, Frank S. C.
Liang, Tyne [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
关键词
Fuzzy association rule mining; Text mining; Document clustering; Frequent itemsets; WordNet;
D O I
10.1007/s10115-010-0364-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of text documents, document clustering technique is emerging for efficient document retrieval and better document browsing. Recently, some methods had been proposed to resolve the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels by using frequent itemsets derived from association rule mining for clustering documents. In order to improve the quality of document clustering results, we propose an effective Fuzzy Frequent Itemset-based Document Clustering ((FIDC)-I-2) approach that combines fuzzy association rule mining with the background knowledge embedded in WordNet. A term hierarchy generated from WordNet is applied to discover generalized frequent itemsets as candidate cluster labels for grouping documents. We have conducted experiments to evaluate our approach on Classic4, Re0, R8, and WebKB datasets. Our experimental results show that our proposed approach indeed provide more accurate clustering results than prior influential clustering methods presented in recent literature.
引用
收藏
页码:687 / 708
页数:22
相关论文
共 32 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
[Anonymous], 1998, Proceedings in Use of WordNet in Natural Language Processing Systems
[3]  
BEIL F, 2002, INT C KNOWL DISC DAT, P436
[4]   Mining fuzzy frequent itemsets for hierarchical document clustering [J].
Chen, Chun-Ling ;
Tseng, Frank S. C. ;
Liang, Tyne .
INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) :193-211
[5]  
CHEN CL, 2008, 3 INT C INN COMP INF, P326
[6]  
Craven M., 1998, AAAI 98
[7]  
CUTTING DR, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P318
[8]  
DAVE DMP, 2003, 12 INT C WORLD WID W
[9]   An optimized sequential pattern matching methodology for sequence classification [J].
Exarchos, Themis P. ;
Tsipouras, Markos G. ;
Papaloukas, Costas ;
Fotiadis, Dimitrios I. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 19 (02) :249-264
[10]  
Fung BCM, 2003, SIAM PROC S, P59