A simulation study to compare robust clustering methods based on mixtures

被引:46
作者
Coretto, Pietro [1 ]
Hennig, Christian [2 ]
机构
[1] Univ Salerno, Dipartimento Sci Econ & Stat, Fisciano, Italy
[2] UCL, Dept Stat Sci, London, England
关键词
Model-based clustering; Gaussian mixture; Mixture of t-distributions; Noise component; MAXIMUM-LIKELIHOOD; ESTIMATORS;
D O I
10.1007/s11634-010-0065-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform "noise": an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as "noise component" to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578-588, 1998), a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch "noise" (RIMLE; Hennig in Ann Stat 32(4):1313-1340, 2004), and MLEs for mixtures of t-distributions with and without estimation of the degrees of freedom (McLachlan and Peel in Stat Comput 10(4):339-348, 2000). The RIMLE (using a method to choose the fixed constant first proposed in Coretto, The noise component in model-based clustering. Ph.D thesis, Department of Statistical Science, University College London, 2008) is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended.
引用
收藏
页码:111 / 135
页数:25
相关论文
共 18 条
[1]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[2]  
CORETTO P, 2008, THESIS U COLL LONDON
[3]  
Cuesta-Albertos JA, 1997, ANN STAT, V25, P553
[4]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588
[5]  
Fraley C., 2006, MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering
[6]   A robust method for cluster analysis [J].
Gallegos, MT ;
Ritter, G .
ANNALS OF STATISTICS, 2005, 33 (01) :347-380
[7]   A general trimming approach to robust cluster analysis [J].
Garcia-Escudero, Luis A. ;
Gordaliza, Alfonso ;
Matran, Carlos ;
Mayo-Iscar, Agustin .
ANNALS OF STATISTICS, 2008, 36 (03) :1324-1345
[9]   Robustness of ML estimators of location-scale mixtures [J].
Hennig, C .
INNOVATIONS IN CLASSIFICATION, DATA SCIENCE, AND INFORMATION SYSTEMS, 2005, :128-137
[10]   Breakdown points for maximum likelihood estimators of location-scale mixtures [J].
Hennig, C .
ANNALS OF STATISTICS, 2004, 32 (04) :1313-1340