An experimental comparison of model-based clustering methods

被引:133
作者
Meila, M [1 ]
Heckerman, D [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
clustering; model-based clustering; naive-Bayes model; multinomial-mixture model; EM algorithm; agglomerative clustering; initialization;
D O I
10.1023/A:1007648401407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We compare the three basic algorithms for model-based clustering on high-dimensional discrete-variable datasets. All three algorithms use the same underlying model: a naive-Bayes model with a hidden root node, also known as a multinomial-mixture model. In the first part of the paper, we perform an experimental comparison between three batch algorithms that learn the parameters of this model: the Expectation-Maximization (EM) algorithm, a "winner take all" version of the EM algorithm reminiscent of the K-means algorithm, and model-based agglomerative clustering. We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization methods on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of agglomerative clustering. Although the methods are substantially different, they lead to learned models that are similar in quality.
引用
收藏
页码:9 / 29
页数:21
相关论文
共 18 条
[1]  
[Anonymous], 1949, Human behaviour and the principle of least-effort
[2]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[3]  
BAUER E, 1997, P 13 C UNC ART INT P
[4]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[5]  
CHEESEMAN P, 1995, ADV KNOWLEDGE DISCOV, P153
[6]   Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables [J].
Chickering, DM ;
Heckerman, D .
MACHINE LEARNING, 1997, 29 (2-3) :181-212
[7]  
Clogg CC, 1995, HDB STAT MODELING SO, P311, DOI DOI 10.1007/978-1-4899-1292-3_6
[8]  
DeGroot M., 1970, OPTIMAL STAT DECISIO
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Dobson AJ., 1990, An introduction to generalized linear models