融合词向量特征的双词主题模型

被引:5
作者
刘良选
黄梦醒
机构
[1] 海南大学信息科学技术学院
关键词
主题模型; 潜在狄利克雷分配; 短文本; 双词主题模型; 词向量; 吉布斯采样;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
针对短文本中固有的文本内容稀疏和上下文信息匮乏等问题,在双词主题模型(BTM)的基础上提出一种融合词向量特征的双词主题模型LF-BTM。该模型引入潜在特征模型以利用丰富的词向量信息弥补内容稀疏,在改进的生成过程中每个双词的词汇的生成受到主题—词汇多项分布和潜在特征模型的共同影响。模型中的参数通过吉布斯采样算法进行估计。在真实的短文本数据集上的实验结果表明,该模型能结合外部通用的大规模语料库上已训练好的词向量挖掘出语义一致性显著提升的主题。
引用
收藏
页码:2055 / 2058
页数:4
相关论文
共 14 条
  • [1] On the limited memory BFGS method for large scale optimization
    Liu, Dong C.
    Nocedal, Jorge
    [J]. Mathematical Programming, Series B, 1989, 45 (01): : 503 - 528
  • [2] Finding scientific topics. Griffiths Thomas L,Steyvers Mark. Proceedings of the National Academy of Sciences of the United States of America . 2004
  • [3] A Biterm topic model for short texts. Yan X,Guo J,Lan Y,et al. Proceedings of the 22nd international conference on World WideWeb . 2013
  • [4] Probabilistic Latent Semantic Analysis. Hofmann T. 15th Conference on Uncertainty in Artificial Intelligence . 1999
  • [5] Empirical study of topic modeling in twitter. HONG L,DAVISON B D. Proceedings of the 1st Workshop on Social Media Analytics . 2010
  • [6] SBTM:Topic Modeling over Short Texts. Pang J,Li X,Xie H, et al. Database Systems for Advanced Applications . 2016
  • [7] Glove:Global vectors for word representation. PENNINGTON J,SOCHER R,MANNING C D. Proceedings of the Conference on Empirical Methods in Natural Language Processing . 2014
  • [8] Improving Topic Models with Latent Feature Word Representations. Nguyen,D.Q.et al. Transactions of the Association for Computational Linguistics . 2015
  • [9] Efficient estimation of word representations in vector space. MIKOLOV T,CHEN K,CORRADO G,et al. Computer Science . 2013
  • [10] Probabilistic Topic Models
    Blei, David M.
    [J]. COMMUNICATIONS OF THE ACM, 2012, 55 (04) : 77 - 84