Clustering by passing messages between data points

被引:4736
作者
Frey, Brendan J. [1 ]
Dueck, Delbert [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
关键词
D O I
10.1126/science.1136800
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such "exemplars" can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called "affinity propagation," which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.
引用
收藏
页码:972 / 976
页数:5
相关论文
共 19 条
[1]   Near optimum error correcting coding and decoding: Turbo-codes [J].
Berrou, C ;
Glavieux, A .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1996, 44 (10) :1261-1271
[2]   A constant-factor approximation algorithm for the k-median problem [J].
Charikar, M ;
Guha, S ;
Tardos, E ;
Shmoys, DB .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 65 (01) :129-149
[3]  
DASGUPTA S, 2000, P 16 C UNC ART INT, P152
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]   Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs [J].
Frey, BJ ;
Mohammad, N ;
Morris, QD ;
Zhang, W ;
Robinson, MD ;
Mnaimneh, S ;
Chang, R ;
Pan, Q ;
Sat, E ;
Rossant, J ;
Bruneau, BG ;
Aubin, JE ;
Blencowe, BJ ;
Hughes, TR .
NATURE GENETICS, 2005, 37 (09) :991-996
[6]  
FREY BJ, 2002, P 14 C NIPS, P737
[7]   NEURAL NETWORKS AND PHYSICAL SYSTEMS WITH EMERGENT COLLECTIVE COMPUTATIONAL ABILITIES [J].
HOPFIELD, JJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1982, 79 (08) :2554-2558
[8]   A split-merge Markov chain Monte Carlo procedure for the dirichlet process mixture model [J].
Jain, S ;
Neal, RM .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2004, 13 (01) :158-182
[9]   Good error-correcting codes based on very sparse matrices [J].
MacKay, DJC .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1999, 45 (02) :399-431
[10]  
MacQueen J., 1967, Proc fifth Berkeley Symp Math Stat Probab, V1, P281