Topic Models for Unsupervised Cluster Matching

被引:13
作者
Iwata, Tomoharu [1 ]
Hirao, Tsutomu [1 ]
Ueda, Naonori [1 ]
机构
[1] NTT Commun Sci Labs, 2-4 Hikaridai, Seika, Kyoto 6190237, Japan
关键词
Topic modeling; unsupervised object matching; clustering;
D O I
10.1109/TKDE.2017.2778720
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose topic models for unsupervised cluster matching, which is the task of finding matching between clusters in different domains without correspondence information. For example, the proposed model finds correspondence between document clusters in English and German without alignment information, such as dictionaries and parallel sentences/documents. The proposed model assumes that documents in all languages have a common latent topic structure, and there are potentially infinite number of topic proportion vectors in a latent topic space that is shared by all languages. Each document is generated using one of the topic proportion vectors and language-specific word distributions. By inferring a topic proportion vector used for each document, we can allocate documents in different languages into common clusters, where each cluster is associated with a topic proportion vector. Documents assigned into the same cluster are considered to be matched. We develop an efficient inference procedure for the proposed model based on collapsed Gibbs sampling. The effectiveness of the proposed model is demonstrated with real data sets including multilingual corpora of Wikipedia and product reviews.
引用
收藏
页码:786 / 795
页数:10
相关论文
共 30 条
[1]  
[Anonymous], 2003, P 26 ANN INT ACM SIG
[2]  
[Anonymous], 2013, P INT JOINT C ART IN
[3]  
[Anonymous], 2007, 2007 IEEE 11 INT C C
[4]  
[Anonymous], P AS C MACH LEARN
[5]  
[Anonymous], 2009, P 25 C UNC ART INT, DOI DOI 10.5555/1795114.1795124
[6]  
[Anonymous], 2003, Proceedings of international ACM SIGIR conference on Research and development in informaion retrieval, DOI DOI 10.1145/860435.860483
[7]  
[Anonymous], P AAAI C ART INT
[8]  
[Anonymous], 2004, ECCV WORKSHOPS
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]  
Djuric N., 2012, Proceedings of the 26th AAAI Conference on Artificial Intelligence, P893