A general trimming approach to robust cluster analysis

被引:154
作者
Garcia-Escudero, Luis A. [1 ]
Gordaliza, Alfonso [1 ]
Matran, Carlos [1 ]
Mayo-Iscar, Agustin [1 ]
机构
[1] Univ Valladolid, Dept Estadist & Invest Operat, E-47005 Valladolid, Spain
关键词
robustness; cluster analysis; trimming; asymprotics; trimmed k-means; EM-algorithm; fast-MCD algorithm; Dykstra's algorithm;
D O I
10.1214/07-AOS515
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new method for performing clustering with the aim of fitting clusters with different scatters and weights. It is designed by allowing to handle a proportion alpha of contaminating data to guarantee the robustness of the method. As a characteristic feature, restrictions on the ratio between the maximum and the minimum eigenvalues of the groups scatter matrices are introduced. This makes the problem to be well defined and guarantees the consistency of the sample solutions to the population ones. The method covers a wide range of clustering approaches depending on the strength of the chosen restrictions. Our proposal includes an algorithm for approximately solving the sample problem.
引用
收藏
页码:1324 / 1345
页数:22
相关论文
共 28 条
[1]  
[Anonymous], 2000, WILEY SERIES PROBABI
[2]  
[Anonymous], 1997, A First Course in Multivariate Statistics, Springer Texts in Statistics
[3]  
[Anonymous], 2002, CLASSIFICATION CLUST
[4]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[5]  
Bock HH., 2002, STAT T, V5, P725
[6]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[7]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   AN ALGORITHM FOR RESTRICTED LEAST-SQUARES REGRESSION [J].
DYKSTRA, RL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (384) :837-842
[10]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588