Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

被引:23
作者
Arias-Castro, Ery [1 ]
机构
[1] Univ Calif San Diego, Dept Math, La Jolla, CA 92093 USA
基金
美国国家科学基金会;
关键词
Clustering; detection in point clouds; extracting connected components; hierarchical clustering with single linkage; manifold learning; minimax rates; nearest-neighbor search; neighborhood graphs; random geometric graphs; spectral methods; MINIMAX; GRAPH; CLASSIFICATION; PERFORMANCE; CONSISTENCY; REDUCTION; RATES;
D O I
10.1109/TIT.2011.2104630
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes, and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for some emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; hierarchical clustering with single linkage; and the spectral clustering method of Ng, Jordan, and Weiss. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first and third methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
引用
收藏
页码:1692 / 1706
页数:15
相关论文
共 67 条
[1]   On spectral learning of mixtures of distributions [J].
Achlioptas, D ;
McSherry, R .
LEARNING THEORY, PROCEEDINGS, 2005, 3559 :458-469
[2]  
[Anonymous], 4 INT WORKSH DYN VIS
[3]  
[Anonymous], 2003, OXFORD STUDIES PROBA
[4]  
[Anonymous], P 3 ANN INT C COMP M
[5]  
[Anonymous], ADV NEURAL INFORM PR
[6]  
[Anonymous], 2005, SPRINGER MG MATH
[7]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C, DOI DOI 10.1145/775047.775110
[8]   The minimum vertex degree of a graph on uniform points in [0,1](d) [J].
Appel, MJB ;
Russo, RP .
ADVANCES IN APPLIED PROBABILITY, 1997, 29 (03) :582-594
[9]   Connect the dots: How many random points can a regular curve pass through? [J].
Arias-Castro, E ;
Donoho, DL ;
Huo, XM ;
Tovey, CA .
ADVANCES IN APPLIED PROBABILITY, 2005, 37 (03) :571-603
[10]   Searching for a trail of evidence in a maze [J].
Arias-Castro, Ery ;
Candes, Emmanuel J. ;
Helgason, Hannes ;
Zeitouni, Ofer .
ANNALS OF STATISTICS, 2008, 36 (04) :1726-1757