HIERARCHICAL RELATIONAL MODELS FOR DOCUMENT NETWORKS

被引:121
作者
Chang, Jonathan [1 ]
Blei, David M. [2 ]
机构
[1] Facebook, Palo Alto, CA 94304 USA
[2] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
Mixed-membership models; variational methods; text analysis; network models; INFERENCE;
D O I
10.1214/09-AOAS309
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We develop the relational topic model (RTM), a hierarchical model of both network structure and node attributes. We focus on document networks, where the attributes of each document are its words, that is, discrete observations taken from a fixed vocabulary. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and estimation algorithms based on variational methods that take advantage of sparsity and scale with the number of links. We evaluate the predictive performance of the RTM for large networks of scientific abstracts, web documents, and geographically tagged news.
引用
收藏
页码:124 / 150
页数:27
相关论文
共 45 条
[11]   Variational Inference for Dirichlet Process Mixtures [J].
Blei, David M. ;
Jordan, Michael I. .
BAYESIAN ANALYSIS, 2006, 1 (01) :121-143
[12]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[13]  
Boyd-Graber Jordan., 2008, Neural Information Processing Systems
[14]  
BRAUN M, 2007, ARXIV07122526
[15]  
CHAKRABARTI S, 1998, P ACM SIGMOD
[16]  
Cohn D, 2001, ADV NEUR IN, V13, P430
[17]  
CRAVEN M, 1998, P AAAI
[18]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[19]  
DIETZ L, 2007, P ICML
[20]   Mixed-membership models of scientific publications [J].
Erosheva, E ;
Fienberg, S ;
Lafferty, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5220-5227