Mixture models for classification

被引:13
作者
Celeux, Gilles [1 ]
机构
[1] Inria Futurs, Orsay, France
来源
Advances in Data Analysis | 2007年
关键词
D O I
10.1007/978-3-540-70981-7_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finite mixture distributions provide efficient approaches of model-based clustering and classification. The advantages of mixture models for unsupervised classification are reviewed. Then, the article is focusing on the model selection problem. The usefulness of taking into account the modeling purpose when selecting a model is advocated in the unsupervised and supervised classification contexts. This point of view had lead to the definition of two penalized likelihood criteria, ICL and BEC, which are presented and discussed. Criterion ICL is the approximation of the integrated completed likelihood and is concerned with model-based cluster analysis. Criterion BEC is the approximation of the integrated conditional likelihood and is concerned with generative models of classification. The behavior of ICL for choosing the number of components in a mixture model and of BEC to choose a model minimizing the expected error rate are analyzed in contrast with standard model selection criteria.
引用
收藏
页码:3 / 14
页数:12
相关论文
共 31 条
[11]   ASYMPTOTIC-BEHAVIOR OF CLASSIFICATION MAXIMUM LIKELIHOOD ESTIMATES [J].
BRYANT, P ;
WILLIAMSON, JA .
BIOMETRIKA, 1978, 65 (02) :273-281
[12]   CLUSTERING CRITERIA FOR DISCRETE-DATA AND LATENT CLASS MODELS [J].
CELEUX, G ;
GOVAERT, G .
JOURNAL OF CLASSIFICATION, 1991, 8 (02) :157-176
[13]   Stochastic versions of the EM algorithm: An experimental study in the mixture case [J].
Celeux, G ;
Chauveau, D ;
Diebolt, J .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) :287-314
[14]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332
[15]  
CELEUX G, 1993, J COMPUTATIONAL SIMU, V14, P315
[16]   Penalized maximum likelihood estimator for normal mixtures [J].
Ciuperca, G ;
Ridolfi, A ;
Idier, J .
SCANDINAVIAN JOURNAL OF STATISTICS, 2003, 30 (01) :45-59
[17]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[18]  
DIEBOLT J, 1994, J ROY STAT SOC B MET, V56, P363
[19]   Unsupervised learning of finite mixture models [J].
Figueiredo, MAT ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) :381-396
[20]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588