A clustering procedure based on the comparison between the k nearest neighbors graph and the mininial spanning tree

被引:27
作者
González-Barrios, JM
Quiroz, AJ
机构
[1] Univ Nacl Autonoma Mexico, Dept Probabilidad & Estadist, Inst Invest Matemat Aplicadas & Sistemas, Mexico City, DF, Mexico
[2] Univ Simon Bolivar, CESMA, Caracas 89000, Venezuela
[3] Univ Simon Bolivar, Dept Computo Cientif & Estadist, Caracas 89000, Venezuela
关键词
nearest neighbors graph; minimal spanning tree; clustering;
D O I
10.1016/S0167-7152(02)00421-2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present a procedure for the identification of clusters in multivariate data sets, based on the comparison between the k nearest neighbors graph, G(k), and the minimal spanning tree, MST. Our key statistic is the random quantity (k) over tilde := the smallest k such that G(k) Contains the MST. Under regularity assumptions, we show that for i.i.d. data from a density on R-d with compact support having one connected component, (k) over tilde = O-Pr(log n), where n denotes sample size, a bound that seems to be sharp, according to simulations. This leads to a consistent test for the identification of crisp clusters. We illustrate the use of our procedure with an example. (C) 2003 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:23 / 34
页数:12
相关论文
共 20 条
[1]   Graph-theoretic procedures for dimension identification [J].
Brito, MR ;
Quiroz, AJ ;
Yukich, JE .
JOURNAL OF MULTIVARIATE ANALYSIS, 2002, 81 (01) :67-84
[2]  
Dudley R.M., 1984, Ecole d'ete de Probabilites de Saint-Flour XII-1982, V1097, P1, DOI DOI 10.1007/BFB0099432
[3]  
Everitt B, 1974, CLUSTER ANAL
[4]   GRAPHICS FOR THE MULTIVARIATE 2-SAMPLE PROBLEM [J].
FRIEDMAN, JH ;
RAFSKY, LC .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (374) :277-287
[5]   MULTIVARIATE GENERALIZATIONS OF THE WALD-WOLFOWITZ AND SMIRNOV 2-SAMPLE TESTS [J].
FRIEDMAN, JH ;
RAFSKY, LC .
ANNALS OF STATISTICS, 1979, 7 (04) :697-717
[6]   GRAPH-THEORETIC MEASURES OF MULTIVARIATE ASSOCIATION AND PREDICTION [J].
FRIEDMAN, JH ;
RAFSKY, LC .
ANNALS OF STATISTICS, 1983, 11 (02) :377-391
[7]  
Harary Frank, GRAPH THEORY
[9]  
IHAKA R, 1996, J COMPUTATIONAL GRAP, V5, P299, DOI DOI 10.2307/1390807
[10]  
Jain K, 1988, Algorithms for clustering data