Probabilistic Topic Models

被引:197
作者
Blei, David [1 ]
Carin, Lawrence [2 ]
Dunson, David [3 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Signal Innovat Grp Inc, Durham, NC USA
[3] Duke Univ, Durham, NC 27706 USA
关键词
Analytical models; Data models; Graphical models; Computational modeling; Bayesian methods; Markov processes; INFERENCE;
D O I
10.1109/MSP.2010.938079
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. Those distributions are called ¿topics¿ because, when fit to data, they capture the salient themes that run through the collection. We describe both finite-dimensional parametric topic models and their Bayesian nonparametric counterparts, which are based on the hierarchical Dirichlet process (HDP). We discuss two extensions of topic models to time-series data¿one that lets the topics slowly change over time and one that lets the assumed prevalence of the topics change. Finally, we illustrate the application of topic models to nontext data, summarizing some recent research results in image analysis. © 2010 IEEE.
引用
收藏
页码:55 / 65
页数:11
相关论文
共 63 条
[1]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[2]  
AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
[3]  
AN Q, 2008, P INT C MACH LEARN
[4]  
Andrzejewski D., 2007, P EUR C MACH LEARN
[5]  
[Anonymous], 2009, IEEE T PATTERN ANAL
[6]  
[Anonymous], 2009, Text Mining: Theory and Applications, DOI DOI 10.1201/9781420059458.CH4
[7]  
[Anonymous], 2004, Springer Texts in Statistics
[8]  
[Anonymous], 2006, LATENT SEMANTIC ANAL
[9]  
[Anonymous], P INT C WEBL SOC MED
[10]  
[Anonymous], P EUR C COMP VIS