On Predicting the Popularity of Newly Emerging Hashtags in Twitter

被引:166
作者
Ma, Zongyang [1 ]
Sun, Aixin [1 ]
Cong, Gao [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2013年 / 64卷 / 07期
关键词
text mining; content filtering; automatic classification;
D O I
10.1002/asi.22844
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i. e., Naive bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
引用
收藏
页码:1399 / 1410
页数:12
相关论文
共 29 条
  • [1] [Anonymous], 2011, P 4 ACM INT C WEB SE, DOI [DOI 10.1145/1935826.1935843, 10.1145/1935826.1935843]
  • [2] [Anonymous], 2011, P 20 INT C WORLD WID, DOI DOI 10.1145/1963405.1963503
  • [3] [Anonymous], 2011, P 20 ACM INT C INF K, DOI 10.1145/2063576.2063726
  • [4] [Anonymous], Proceedings of the fifth ACMinternational conference on Web search and data mining, DOI [DOI 10.1145/2124295.2124320, 10.1145/2124295.2124320]
  • [5] [Anonymous], 2008, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM
  • [6] [Anonymous], 2010, P 3 ACM INT C WEB SE, DOI DOI 10.1145/1718487.1718520
  • [7] [Anonymous], 2006, P 12 ACM SIGKDD INT
  • [8] [Anonymous], 2012, Proceedings of the 21st international conference on World Wide Web, DOI DOI 10.1145/2187836.2187872
  • [9] [Anonymous], 2010, Proc. 19th Int. Conf. World Wide Web
  • [10] [Anonymous], 2005, P 11 ACM SIGKDD INT