Estimating the number of clusters

被引:74
作者
Cuevas, A [1 ]
Febrero, M [1 ]
Fraiman, R [1 ]
机构
[1] Univ Autonoma Madrid, Dept Matemat, E-28049 Madrid, Spain
来源
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE | 2000年 / 28卷 / 02期
关键词
cluster analysis; density estimates; level sets; number of modes; smoothed bootstrap; support estimation;
D O I
10.2307/3315985
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Hartigan (1975) defines the number q of clusters in a ed-variate statistical population as the number of connected components of the set {f > c}, where f denotes the underlying density function an R-d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of {f > c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f. The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.
引用
收藏
页码:367 / 382
页数:16
相关论文
共 27 条
[1]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI DOI 10.1016/C2013-0-06161-0
[2]  
Cuevas A, 1997, ANN STAT, V25, P2300
[3]  
CUEVAS A, 1998, UNPUB CLUSTER ANAL F
[4]  
DEROYE L, 1985, NONPARAMETRIC DENSIT
[5]   DETECTION OF ABNORMAL-BEHAVIOR VIA NONPARAMETRIC-ESTIMATION OF THE SUPPORT [J].
DEVROYE, L ;
WISE, GL .
SIAM JOURNAL ON APPLIED MATHEMATICS, 1980, 38 (03) :480-488
[6]  
Everitt B., 1993, CLUSTER ANAL
[7]   Counting bumps [J].
Fraiman, R ;
Meloche, J .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1999, 51 (03) :541-569
[8]  
GOOD IJ, 1980, J AM STAT ASSOC, V75, P42, DOI 10.2307/2287377
[9]  
Gyorfi L., 1985, The L 1 View
[10]   Approximations to distributions of statistics used for testing hypotheses about the number of modes of a population [J].
Hall, P ;
Wood, ATA .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1996, 55 (03) :299-317