Structure learning in conditional probability models via an entropic prior and parameter extinction

被引:75
作者
Brand, M [1 ]
机构
[1] Mitsubishi Elect Res Labs, Cambridge Res Ctr, Cambridge, MA 02139 USA
关键词
D O I
10.1162/089976699300016395
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce an entropic prior for multinomial parameter estimation problems and solve for its maximum a posteriori (MAP) estimator. The prior is a bias for maximally structured and minimally ambiguous models. In conditional probability models with hidden state, iterative MAP estimation drives weakly supported parameters toward extinction, effectively turning them off. Thus, structure discovery is folded into parameter estimation. We then establish criteria for simplifying a probabilistic model's graphical structure by trimming parameters and states, with a guarantee that any such deletion will increase the posterior probability of the model. Trimming accelerates learning by sparsifying the model. All operations monotonically and maximally increase the posterior probability, yielding structure-learning algorithms only slightly slower than parameter estimation via expectation-maximization and orders of magnitude faster than search-based structure induction. When applied to hidden Markov model training, the resulting models show superior generalization to held-out test data. In many cases the resulting models are so sparse and concise that they are interpretable, with hidden states that strongly correlate with meaningful categories.
引用
收藏
页码:1155 / 1182
页数:28
相关论文
共 45 条
[1]  
[Anonymous], 1996, Probability theory: the logic of science
[2]  
[Anonymous], 1960, MAGYAR TUD AKAD MAT
[3]   Statistical inference, Occam's razor, and statistical mechanics on the space of probability distributions [J].
Balasubramanian, V .
NEURAL COMPUTATION, 1997, 9 (02) :349-368
[4]  
BAUER E, 1997, P NC ART INT PROV
[5]  
Bengio Y., 1997, MARKOVIAN MODELS SEQ
[6]  
BENGIO Y, 1995, ADV NEURAL INFORMATI, V7, P553
[7]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[8]  
BRAND M, 1997, 9725 MIT EL RES LABS
[9]  
BRAND M, 1999, ENTROPIC ESTIMATION
[10]  
BRAND M, 1999, P 7 INT C ART INT ST