Dynamic topic modeling via self-aggregation for short text streams

被引:19
作者
Shi, Lei [1 ]
Du, Junping [1 ]
Liang, Meiyu [1 ]
Kou, Feifei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing Key Lab Intelligent Telecommun Software &, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic topic modeling; Self-aggregation; Sparsity problem; Social networks;
D O I
10.1007/s12083-018-0692-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social networks such as Twitter, Facebook, and Sina microblogs have emerged as major sources for discovering and sharing the latest topics. Because social network topics change quickly, developing an effective method to model such topics is urgently needed. However, topic modeling is challenging due to the sparsity problem and the dynamic change of topics in microblog streams. In this study, we propose dynamic topic modeling via a self-aggregation method (SADTM) that can capture the time-varying aspect of topic distributions and resolve the sparsity problem. The SADTM aggregates the observable and unordered short texts as the aggregated document without leveraging an external context to overcome the sparsity problem of short text. Furthermore, we exploit word pairs instead of words for each microblog to generate more word co-occurrence patterns. The SADTM models temporal dynamics by using the topic distribution at previous time steps in microblog streams to infer the current topic from sequential data. Extensive experiments on a real-world Sina microblog dataset demonstrate that our SADTM algorithm outperforms several state-of-the-art methods in topic coherence and cluster quality. Additionally, when applied in a search scene, our SADTM significantly outperforms all baseline methods in terms of the quality of the search results.
引用
收藏
页码:1403 / 1417
页数:15
相关论文
共 33 条
  • [1] Hashtag-based topic evolution in social media
    Alam, Md Hijbul
    Ryu, Woo-Jong
    Lee, SangKeun
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1527 - 1549
  • [2] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [3] Cha Y, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P223
  • [4] BTM: Topic Modeling over Short Texts
    Cheng, Xueqi
    Yan, Xiaohui
    Lan, Yanyan
    Guo, Jiafeng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (12) : 2928 - 2941
  • [5] Croft WB, 2010, SEARCH ENGINES INFOR
  • [6] Finding scientific topics
    Griffiths, TL
    Steyvers, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 : 5228 - 5235
  • [7] Hua T, 2016, AAAI CONF ARTIF INTE, P2964
  • [8] Topic Models for Unsupervised Cluster Matching
    Iwata, Tomoharu
    Hirao, Tsutomu
    Ueda, Naonori
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (04) : 786 - 795
  • [9] Li X, 2017, TRANSL SURG, V2, P1, DOI DOI 10.1172/jci.insight.90777
  • [10] Liang S., 2017, ACM Transactions on Information Systems (TOIS), V36, P1