GAUSSIAN PARSIMONIOUS CLUSTERING MODELS

被引:559
作者
CELEUX, G
GOVAERT, G
机构
[1] INST NATL RECH INFORMAT & AUTOMAT,F-78153 LE CHESNAY,FRANCE
[2] UNIV TECHNOL COMPIEGNE,CNRS,URA 817,HEUDIASYC LAB,F-60206 COMPIEGNE,FRANCE
关键词
GAUSSIAN MIXTURE; EIGENVALUE DECOMPOSITION; CLUSTER VOLUMES;
D O I
10.1016/0031-3203(94)00125-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gaussian clustering models are useful both for understanding and suggesting powerful criteria. BanfIeld and Raftery, Biometriks 49, 803-821 (1993), have considered a parameterization of the variance matrix Sigma(k) of a cluster P-k in terms of its eigenvalue decomposition, Sigma(k) = lambda(k)D(k)A(k)D'(k), where lambda(k) defines the volume of P-k, D-k is an orthogonal matrix which defines its orientation and A(k) is a diagonal matrix with determinant 1 which defines its shape. This parametrization allows us to propose many general clustering criteria from the simplest one (spherical clusters with equal volumes which leads to the classical k-means criterion) to the most complex one (unknown and different volumes, orientations and shapes for all clusters). Methods of optimization to derive the maximum likelihood estimates as well as the practical usefulness of these models are discussed. We especially analyse the influence of the volumes of clusters. We report Monte Carlo simulations and an application on stellar data which dramatically illustrated the relevance of allowing clusters to have different volumes.
引用
收藏
页码:781 / 793
页数:13
相关论文
共 16 条
[1]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[2]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[3]  
Celeux G., 1993, J STAT COMPUT SIM, V47, P127
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]   AN ALGORITHM FOR SIMULTANEOUS ORTHOGONAL TRANSFORMATION OF SEVERAL POSITIVE DEFINITE SYMMETRICAL-MATRICES TO NEARLY DIAGONAL FORM [J].
FLURY, BN ;
GAUTSCHI, W .
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1986, 7 (01) :169-184
[6]   COMMON PRINCIPAL COMPONENTS IN K-GROUPS [J].
FLURY, BN .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1984, 79 (388) :892-898
[7]   ERROR RATES IN QUADRATIC DISCRIMINATION WITH CONSTRAINTS ON THE COVARIANCE MATRICES [J].
FLURY, BW ;
SCHMID, MJ ;
NARAYANAN, A .
JOURNAL OF CLASSIFICATION, 1994, 11 (01) :101-120
[8]   ON SOME INVARIANT CRITERIA FOR GROUPING DATA [J].
FRIEDMAN, HP ;
RUBIN, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1967, 62 (320) :1159-&
[9]   MULTIVARIATE CLUSTERING PROCEDURES WITH VARIABLE METRICS [J].
MARONNA, R ;
JACOVKIS, PM .
BIOMETRICS, 1974, 30 (03) :499-505
[10]   SEPARATING MIXTURES OF NORMAL DISTRIBUTIONS [J].
MARRIOTT, FHC .
BIOMETRICS, 1975, 31 (03) :767-769