Empirical and theoretical comparisons of selected criterion functions for document clustering

被引:353
作者
Zhao, Y [1 ]
Karypis, G [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
partitional clustering; criterion function; data mining; information retrieval;
D O I
10.1023/B:MACH.0000027785.44527.d6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets. Our study involves a total of seven different criterion functions, three of which are introduced in this paper and four that have been proposed in the past. We present a comprehensive experimental evaluation involving 15 different datasets, as well as an analysis of the characteristics of the various criterion functions and their effect on the clusters they produce. Our experimental results show that there are a set of criterion functions that consistently outperform the rest, and that some of the newly proposed criterion functions lead to the best overall results. Our theoretical analysis shows that the relative performance of the criterion functions depends on (i) the degree to which they can correctly operate when the clusters are of different tightness, and (ii) the degree to which they can lead to reasonably balanced clusters.
引用
收藏
页码:311 / 331
页数:21
相关论文
共 47 条
[1]  
[Anonymous], 1999, TR99020 U MINN DEP C
[2]  
[Anonymous], 1998, P 1998 ACM SIGMOD IN
[3]  
[Anonymous], REUTERS 21578 TEXT C
[4]  
Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176
[5]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[6]  
BOLEY D, 1998, DATA MINING KNOWLEDG, V2
[7]  
Cheeseman P.C., 1996, ADV KNOWLEDGE DISCOV, V180, P153, DOI https://doi.org/10.5555/257938.257954
[8]   AN IMPROVED 2-WAY PARTITIONING ALGORITHM WITH STABLE PERFORMANCE [J].
CHENG, CK ;
WEI, YCA .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1991, 10 (12) :1502-1511
[9]  
CUTTING DR, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P318
[10]  
DEMPSTER AP, 1977, J ROYAL STAT SOC, V39