Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates

被引:67
作者
Ormoneit, D [1 ]
Tresp, V
机构
[1] Tech Univ Munchen, Dept Comp Sci, D-81730 Munchen, Germany
[2] Univ Calif San Diego, Dept Econ, La Jolla, CA 92037 USA
[3] Siemens AG, Corp Technol, Dept Informat & Commun, D-81730 Munchen, Germany
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1998年 / 9卷 / 04期
关键词
bagging; Bayesian inference; data augmentation; EM algorithm; ensemble averaging; Gaussian mixture model; Gibbs sampling; penalized likelihood; probability density estimation;
D O I
10.1109/72.701177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We apply the idea of averaging ensembles of estimators to probability density estimation. In particular, we use Gaussian mixture models which are important components in many neural-network applications, We investigate the performance of averaging using three data sets. For comparison, we employ two, traditional regularization approaches, i.e., a maximum penalized likelihood approach and a Bayesian approach. In the maximum penalized likelihood approach we use penalty functions derived from conjugate Bayesian priors such that an expectation maximization (EM) algorithm can be used for training. In all experiments, the maximum penalized likelihood approach and averaging improved performance considerably if compared to a maximum likelihood approach. In two of the experiments, the maximum penalized likelihood approach outperformed averaging. In one experiment averaging was clearly superior, Our conclusion is that maximum penalized likelihood gives good results if the penalty term in the cost function is appropriate for the particular problem. If this is not the case, averaging is superior since it shows greater robustness by not relying on any particular prior assumption, The Bayesian approach worked very web on a low-dimensional toy problem but failed to give good performance in higher dimensional problems.
引用
收藏
页码:639 / 650
页数:12
相关论文
共 33 条
[1]  
[Anonymous], 1996, BIAS VARIANCE ARCING
[2]  
[Anonymous], [No title captured]
[3]  
[Anonymous], 1993, THESIS BROWN U PROVI
[4]  
Bernardo J.M., 2009, Bayesian Theory, V405
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]  
Cohn D. A., 1995, Advances in Neural Information Processing Systems 7, P705
[7]  
DAY NE, 1969, BIOMETRIKA, V56, P463, DOI 10.1093/biomet/56.3.463
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
DIEBOLT J, 1994, J ROY STAT SOC B MET, V56, P363
[10]  
DRUCKER H, 1993, ADV NEURAL INFORMATI, V5