HIERARCHICAL RELATIONAL MODELS FOR DOCUMENT NETWORKS

被引:121
作者
Chang, Jonathan [1 ]
Blei, David M. [2 ]
机构
[1] Facebook, Palo Alto, CA 94304 USA
[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
Mixed-membership models; variational methods; text analysis; network models; INFERENCE;
D O I
10.1214/09-AOAS309
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We develop the relational topic model (RTM), a hierarchical model of both network structure and node attributes. We focus on document networks, where the attributes of each document are its words, that is, discrete observations taken from a fixed vocabulary. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and estimation algorithms based on variational methods that take advantage of sparsity and scale with the number of links. We evaluate the predictive performance of the RTM for large networks of scientific abstracts, web documents, and geographically tagged news.
引用
收藏
页码:124 / 150
页数:27
相关论文
共 45 条
[1]  
Airoldi EM, 2008, J MACH LEARN RES, V9, P1981
[2]  
[Anonymous], NEW DIRECTIONS STAT
[3]  
[Anonymous], 1999, J ACM
[4]  
[Anonymous], 2008, P 14 ACM SIGKDD INT
[5]  
[Anonymous], 2000, INFORM RETRIEVAL
[6]  
[Anonymous], 2007, HDB LATENT SEMANTIC
[7]  
[Anonymous], NEURAL INFORM PROCES
[8]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[9]   Matching words and pictures [J].
Barnard, K ;
Duygulu, P ;
Forsyth, D ;
de Freitas, N ;
Blei, DM ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1107-1135
[10]  
Blei D.M., 2003, P 26 ANN INT ACM SIG