A robust method for cluster analysis

被引:74
作者
Gallegos, MT [1 ]
Ritter, G [1 ]
机构
[1] Univ Passau, Fak Math & Informat, D-94030 Passau, Germany
关键词
cluster analysis; multivariate data; outliers; robustness; breakdown point; determinant criterion; minimal distance partition;
D O I
10.1214/009053604000000940
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let there be given a contaminated list of n R-d-valued observations coming from g different, normally distributed populations with a common covariance matrix. We compute the ML-estimator with respect to a certain statistical model with n - r outliers for the parameters of the g populations it detects outliers and simultaneously partitions their complement into g clusters. It turns out that the estimator unites both the minimum-covariance-determinant rejection method and the well-known pooled determinant criterion of cluster analysis. We also propose an efficient algorithm for approximating this estimator and study its breakdown points for mean values and pooled SSP matrix.
引用
收藏
页码:347 / 380
页数:34
相关论文
共 26 条
[1]  
Barnett V., 1984, Outliers in Statistical Data, V2nd
[2]  
Bezdek J., 1999, FUZZY MODELS ALGORIT
[3]   Cluster analysis for large datasets: An effective algorithm for maximizing the mixture likelihood [J].
Coleman, DA ;
Woodruff, DL .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2000, 9 (04) :672-688
[4]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[5]  
DONOHO DL, 1983, FESTSCHRIFT EL LEHMA, P157
[6]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[7]   ON SOME INVARIANT CRITERIA FOR GROUPING DATA [J].
FRIEDMAN, HP ;
RUBIN, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1967, 62 (320) :1159-&
[8]   Trimming tools in exploratory data analysis [J].
García-Escudero, LA ;
Gordaliza, A ;
Matrán, C .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (02) :434-449
[9]   Robustness properties of k means and trimmed k means [J].
García-Escudero, LA ;
Gordaliza, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (447) :956-969
[10]   MAXIMUM-LIKELIHOOD ESTIMATION IN THE PRESENCE OF OUTLIERS [J].
GATHER, U ;
KALE, BK .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1988, 17 (11) :3767-3784