Speed-up for the expectation-maximization algorithm for clustering categorical data

被引:18
作者
Jollois, F. -X.
Nadif, M.
机构
[1] Univ Paris 05, CRIP5, F-75270 Paris 06, France
[2] Univ Paul Verlaine Metz, LITA, UFR MIM, F-57045 Metz 1, France
关键词
mixture model; expectation-maximization algorithm; clustering; acceleration; categorical data;
D O I
10.1007/s10898-006-9059-3
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 [运筹学与控制论]; 12 [管理学]; 1201 [管理科学与工程]; 1202 [工商管理学]; 120202 [企业管理];
摘要
In model-based cluster analysis, the expectation-maximization (EM) algorithm has a number of desirable properties, but in some situations, this algorithm can be slow to converge. Some variants are proposed to speed-up EM in reducing the time spent in the E-step, in the case of Gaussian mixture. The main aims of such methods is first to speed-up convergence of EM, and second to yield same results (or not so far) than EM itself. In this paper, we compare these methods from categorical data, with the latent class model, and we propose a new variant that sustains better results on synthetic and real data sets, in terms of convergence speed-up and number of misclassified objects.
引用
收藏
页码:513 / 525
页数:13
相关论文
共 16 条
[1]
[Anonymous], 1996, ADV KNOWLEDGE DISCOV
[2]
A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[3]
DAY NE, 1969, BIOMETRIKA, V56, P464
[4]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]
On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[6]
Comparison of the mixture and the classification maximum likelihood in cluster analysis with binary data [J].
Govaert, G ;
Nadif, M .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) :65-81
[7]
CONJUGATE-GRADIENT ACCELERATION OF THE EM ALGORITHM [J].
JAMSHIDIAN, M ;
JENNRICH, RI .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :221-228
[8]
Lazarfeld P. F., 1968, LATENT STRUCTURE ANA
[9]
MACLACHLAN G, 2000, FINITE MIXTRE MODELS
[10]
McCallum A., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P169, DOI 10.1145/347090.347123