Hierarchical Clustering With Prototypes via Minimax Linkage

被引:100
作者
Bien, Jacob [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Hlth Res & Policy, Stanford, CA 94305 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Agglomerative; Dendrogram; Unsupervised learning; PATTERNS;
D O I
10.1198/jasa.2011.tm10183
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
070103 [概率论与数理统计]; 140311 [社会设计与社会创新];
摘要
Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage-that is, on how one measures the distance between clusters. In this article we investigate minimax linkage, a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage's strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.
引用
收藏
页码:1075 / 1084
页数:10
相关论文
共 20 条
[1]
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]
[Anonymous], 2001, Approximation algorithms
[3]
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[4]
CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs [J].
Ao, SI ;
Yip, K ;
Ng, M ;
Cheung, D ;
Fong, PY ;
Melhado, I ;
Sham, PC .
BIOINFORMATICS, 2005, 21 (08) :1735-1736
[5]
Hausdorff clustering [J].
Basalto, Nicolas ;
Bellotti, Roberto ;
De Carlo, Francesco ;
Facchi, Paolo ;
Pantaleo, Ester ;
Pascazio, Saverio .
PHYSICAL REVIEW E, 2008, 78 (04)
[6]
Hybrid hierarchical clustering with applications to microarray data [J].
Chipman, H ;
Tibshirani, R .
BIOSTATISTICS, 2006, 7 (02) :286-301
[7]
Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[8]
Everitt B. S., 2001, CLUSTER ANAL
[9]
ADMISSIBLE CLUSTERING PROCEDURES [J].
FISHER, L ;
VANNESS, JW .
BIOMETRIKA, 1971, 58 (01) :91-&
[10]
Gersho A., 2012, Vector Quantization and Signal Compression, V159