Comparing clusterings by the variation of information

被引:361
作者
Meila, M [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
来源
LEARNING THEORY AND KERNEL MACHINES | 2003年 / 2777卷
关键词
clustering; comparing partitions; measures of agreement; information theory; mutual information;
D O I
10.1007/978-3-540-45167-9_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering C to clustering C'. The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings.
引用
收藏
页码:173 / 187
页数:15
相关论文
共 12 条
[1]  
Ben-Hur Asa, 2002, Pac Symp Biocomput, P6
[2]  
Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X
[3]   A METHOD FOR COMPARING 2 HIERARCHICAL CLUSTERINGS [J].
FOWLKES, EB ;
MALLOWS, CL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (383) :553-569
[4]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218
[5]  
Larsen B., 1999, P 5 ACM SIGKDD INT C, P16, DOI [10.1145/312129.312186, DOI 10.1145/312129.312186]
[6]  
LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489
[7]   An experimental comparison of model-based clustering methods [J].
Meila, M ;
Heckerman, D .
MACHINE LEARNING, 2001, 42 (1-2) :9-29
[8]  
MEILA M, 2002, 419 U WASH
[9]  
Mirkin B., 1996, Mathematical Classification and Clustering