Convergence results for the EM approach to mixtures of experts architectures

被引:151
作者
Jordan, MI
Xu, L
机构
关键词
supervised learning; statistical models; maximum likelihood; EM algorithm; convergence rate; mixture models; hierarchical models; optimization;
D O I
10.1016/0893-6080(95)00014-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
引用
收藏
页码:1409 / 1431
页数:23
相关论文
共 21 条
[1]   GROWTH TRANSFORMATIONS FOR FUNCTIONS ON MANIFOLDS [J].
BAUM, LE ;
SELL, GR .
PACIFIC JOURNAL OF MATHEMATICS, 1968, 27 (02) :211-&
[2]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[3]  
Breiman L, 2017, CLASSIFICATION REGRE, P368, DOI 10.1201/9781315139470
[4]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[5]  
DEVEAUX RD, 1986, THESIS STANFORD U, V274
[6]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[7]   Adaptive Mixtures of Local Experts [J].
Jacobs, Robert A. ;
Jordan, Michael I. ;
Nowlan, Steven J. ;
Hinton, Geoffrey E. .
NEURAL COMPUTATION, 1991, 3 (01) :79-87
[8]   HIERARCHICAL MIXTURES OF EXPERTS AND THE EM ALGORITHM [J].
JORDAN, MI ;
JACOBS, RA .
NEURAL COMPUTATION, 1994, 6 (02) :181-214
[9]  
JORDAN MI, 1992, ADV NEUR IN, V4, P985
[10]  
LITTEL RJA, 1987, STATISTICAL ANAL MIS