A comparison of extrinsic clustering evaluation metrics based on formal constraints

被引:474
作者
Amigo, Enrique [1 ]
Gonzalo, Julio [1 ]
Artiles, Javier [1 ]
Verdejo, Felisa [1 ]
机构
[1] Univ Nacl Educ Distancia, Dept Lenguajes & Sistemas Informat, Madrid, Spain
来源
INFORMATION RETRIEVAL | 2009年 / 12卷 / 04期
关键词
Clustering; Evaluation metrics; Formal constraints;
D O I
10.1007/s10791-008-9066-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.
引用
收藏
页码:461 / 486
页数:26
相关论文
共 16 条
[1]  
[Anonymous], 1998, P 17 INT C COMP LING
[2]  
[Anonymous], 2007, Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), DOI DOI 10.3115/1621474.1621486
[3]  
Bakus J, 2002, ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, P2212
[4]  
Dom B.E., 2001, INFORM THEORETIC EXT
[5]  
Ghosh J., 2003, HDB DATA MINING
[6]  
GONZALO J, 2005, P 28 ANN INT ACM SIG, P603
[7]   On clustering validation techniques [J].
Halkidi, M ;
Batistakis, Y ;
Vazirgiannis, M .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2001, 17 (2-3) :107-145
[8]  
Larsen B., 1999, P 5 ACM SIGKDD INT C, DOI [10.1145/312129.312186, DOI 10.1145/312129.312186]
[9]  
MEILA M, 2003, P COLT 03 WASH DC
[10]  
Pantel P., 2002, P PRICAI 2002 7 PAC, P18