Exploring the number of groups in robust model-based clustering

被引:52
作者
Garcia-Escudero, L. A. [1 ]
Gordaliza, A. [1 ]
Matran, C. [1 ]
Mayo-Iscar, A. [1 ]
机构
[1] Univ Valladolid, Fac Ciencias, Dept Estadist & Invest Operat, Valladolid 47002, Spain
关键词
Heterogeneous clusters; Number of groups; Strength of group-assignments; Trimming; MIXTURE MODEL; POINT; ESTIMATORS; ALGORITHMS;
D O I
10.1007/s11222-010-9194-z
中图分类号
TP301 [理论、方法];
学科分类号
080201 [机械制造及其自动化];
摘要
Two key questions in Clustering problems are how to determine the number of groups properly and measure the strength of group-assignments. These questions are specially involved when the presence of certain fraction of outlying data is also expected. Any answer to these two key questions should depend on the assumed probabilistic-model, the allowed group scatters and what we understand by noise. With this in mind, some exploratory "trimming-based" tools are presented in this work together with their justifications. The monitoring of optimal values reached when solving a robust clustering criteria and the use of some "discriminant" factors are the basis for these exploratory tools.
引用
收藏
页码:585 / 599
页数:15
相关论文
共 46 条
[1]
[Anonymous], 2000, Sankhya Ser. A, DOI DOI 10.2307/25051289
[2]
MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]
Becker C, 1999, J AM STAT ASSOC, V94, P947
[4]
Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[5]
Biernacki C., 1997, COMPUTING SCI STAT, V29, P451
[6]
Probabilistic models in cluster analysis [J].
Bock, HH .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) :5-28
[7]
LARGE-SAMPLE RESULTS FOR OPTIMIZATION-BASED CLUSTERING METHODS [J].
BRYANT, PG .
JOURNAL OF CLASSIFICATION, 1991, 8 (01) :31-44
[8]
Calinski T., 1974, Communications in Statistics-theory and Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
[9]
A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[10]
Celeux G., 1992, PATTERN RECOGN, V28, P781