Difficulties in drawing inferences with finite-mixture models: A simple example with a simple solution

被引:47
作者
Chung, H
Loken, E
Schafer, JL
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[2] Penn State Univ, Methodol Ctr, University Pk, PA 16802 USA
关键词
EM algorithm; label switching; Markov chain Monte Carlo;
D O I
10.1198/0003130043286
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Likelihood functions from finite mixture models have many unusual features. Maximum likelihood (ML) estimates may behave poorly over repeated samples, and the abnormal shape of the likelihood often makes it difficult to assess the uncertainty in parameter estimates. Bayesian inference via Markov chain Monte Carlo (MCMC) can be a useful alternative to ML, but the component labels may switch during the MCMC run, making the output difficult to interpret. Two basic methods for handling the label-switching problem have been proposed: imposing constraints on the parameter space and cluster-based relabeling of the simulated parameters. We have found that label switching may also be reduced by supplying small amounts of prior information that are asymmetric with respect to the mixture components. Simply assigning one observation to each component a priori may effectively eliminate the problem. Using a very simple example-a univariate sample from a mixture of two exponentials-we evaluate the performance of likelihood and MCMC-based estimates and intervals over repeated sampling. Our simulations show that MCMC performs much better than ML if the label-switching problem is adequately addressed, and that asymmetric prior information performs as well as or better than the other proposed methods.
引用
收藏
页码:152 / 158
页数:7
相关论文
共 20 条
[1]  
AITKIN M, 1985, J ROY STAT SOC B MET, V47, P67
[2]  
BROOKS SP, 1998, STATISTICIAN, V44, P69
[3]   Computational and inferential difficulties with mixture posterior distributions. [J].
Celeux, G ;
Hurn, M ;
Robert, CP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2000, 95 (451) :957-970
[4]  
Celeux G., 1998, COMPSTAT 98, P227, DOI [10.1007/978-3-662-01131-7_26, DOI 10.1007/978-3-662-01131-7_26]
[5]  
CHUNG H, 2003, THESIS PENNSYLVANIA
[6]   Markov chain Monte Carlo convergence diagnostics: A comparative review [J].
Cowles, MK ;
Carlin, BP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :883-904
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]   SAMPLING-BASED APPROACHES TO CALCULATING MARGINAL DENSITIES [J].
GELFAND, AE ;
SMITH, AFM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (410) :398-409
[9]  
GOODMAN LA, 1974, BIOMETRIKA, V61, P215, DOI 10.1093/biomet/61.2.215
[10]  
Little R.J., 1987, Statistical Analysis With Missing Data