An EM-Like Algorithm for Semi- and Nonparametric Estimation in Multivariate Mixtures

被引:84
作者
Benaglia, Tatiana [1 ]
Chauveau, Didier [2 ]
Hunter, David R. [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[2] Univ Orleans, MAPMO, UMR 6628, F-45067 Orleans 2, France
关键词
EM algorithm; Kernel density estimation; Multivariate mixture; Nonparametric mixture; INFERENCE;
D O I
10.1198/jcgs.2009.07175
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
070103 [概率论与数理统计]; 140311 [社会设计与社会创新];
摘要
We propose an algorithm for nonparametric estimation for finite mixtures of multivariate random vectors that strongly resembles a true EM algorithm. The vectors are assumed to have independent coordinates conditional upon knowing from which mixture component they come, but otherwise their density functions are completely unspecified. Sometimes, the density functions may be partially specified by Euclidean parameters, a case we call semiparametric. Our algorithm is much more flexible and easily applicable than existing algorithms in the literature; it can be extended to any number of mixture components and any number of vector coordinates of the multivariate observations. Thus it may be applied even in situations where the model is not identifiable, so care is called for when using it in situations for which identifiability is difficult to establish conclusively. Our algorithm yields much smaller mean integrated squared errors than an alternative algorithm in it simulation study. In another example using a real dataset, it provides new insights that extend previous analyses. Finally, we present two different variations of our algorithm, one stochastic and one deterministic, and find anecdotal evidence that there is not a great deal of difference between the performance of these two variants. The computer code and data used in this article are available online.
引用
收藏
页码:505 / 526
页数:22
相关论文
共 25 条
[1]
ANDERSON JA, 1979, BIOMETRIKA, V66, P17, DOI 10.1093/biomet/66.1.17
[2]
A stochastic EM algorithm for a semiparametric mixture model [J].
Bordes, Laurent ;
Chauveau, Didier ;
Vandekerkhove, Pierre .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (11) :5429-5443
[3]
Semiparametric estimation of a two-component mixture model [J].
Bordes, Laurent ;
Mottelet, Stephane ;
Vandekerkhove, Pierre .
ANNALS OF STATISTICS, 2006, 34 (03) :1204-1232
[4]
Stochastic versions of the EM algorithm: An experimental study in the mixture case [J].
Celeux, G ;
Chauveau, D ;
Diebolt, J .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) :287-314
[5]
Semiparametric mixture models and repeated measures: the multinomial cut point model [J].
Cruz-Medina, IR ;
Hettmansperger, TP ;
Thomas, H .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2004, 53 :463-474
[6]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]
An application of classical invariant theory to identifiability in nonparametric mixtures [J].
Elmore, R ;
Hall, P ;
Neeman, A .
ANNALES DE L INSTITUT FOURIER, 2005, 55 (01) :1-28
[8]
Estimating component cumulative distribution functions in finite mixture models [J].
Elmore, RT ;
Hettmansperger, TP ;
Thomas, H .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2004, 33 (09) :2075-2086
[9]
ELMORE RT, 2003, THESIS PENNSYLVANIA
[10]
ELMORE RT, 2003, 0304 PENN STAT DEP S