Short Text Classification in Twitter to Improve Information Filtering

被引:365
作者
Sriram, Bharath [1 ]
Fuhry, David [1 ]
Demir, Engin [1 ]
Ferhatosmanoglu, Hakan [1 ]
Demirbas, Murat
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
来源
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL | 2010年
关键词
Short text; classification; Twitter; feature selection;
D O I
10.1145/1835449.1835643
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In microblogging services such as Twitter, the users may become overwhelmed by the raw data. One solution to this problem is the classification of short text messages. As short texts do not provide sufficient word occurrences, traditional classification methods such as "Bag-Of-Words" have limitations. To address this problem, we propose to use a small set of domain-specific features extracted from the author's profile and text. The proposed approach effectively classifies the text to a predefined set of generic classes such as News, Events, Opinions, Deals, and Private Messages.
引用
收藏
页码:841 / 842
页数:2
相关论文
共 6 条
[1]  
Altingovde Ismail Sengor, 2008, P SIGIR SING JUL, P861
[2]  
[Anonymous], P 30 ANN INT ACM SIG
[3]  
[Anonymous], 2009, P 17 ACM SIGSP INT C
[4]  
Hu Xia., 2009, Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, P919
[5]  
Java A., 2007, Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis, P56
[6]  
Phan Xuan-Hieu, 2008, Proceedings of the 17th international conference on World Wide Web, WWW '08, P91