Micro-blog in China: identify influential users and automatically classify posts on Sina micro-blog

被引:26
作者
Wu, Xinmiao [1 ]
Wang, Jianmin [1 ]
机构
[1] Sun Yat Sen Univ, Dept Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Micro-blog; Social network; Ranking; Classification; Data mining;
D O I
10.1007/s12652-012-0121-3
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Sina micro-blog (Weibo) is the first micro-blogging service in China and is growing fast in recent two years. This paper first studies the characteristics of Sina online social network and then focuses on the problem of indentifying influential users and automatic micro-blog classification. In a dataset prepared for this study, we find an approximate power-law follower distribution and a non-power-law friend distribution, a log correlation between follower number and tweet number, etc. In order to find the most popular users, we propose our algorithm called XinRank and compare it with the other two algorithms. The result shows that XinRank is different and it offers a new perspective for people to find influential users. In addition, our algorithm is dynamic and stability, which is special and better than the other two algorithms. We attempt to automatically classify a single Chinese micro-blog post into a set of high-level categories using a naive Bayes classifier. Our research indicates that even though an average micro-blogging post in Chinese is only 28 words in length, they can be categorized into one of eight categories with an average performance up to 84.2 %, using our proposed process. We try to address the automatic user interest discovery problem at the end of this paper. And finally, we combine XinRank and our micro-blog classifier to propose an interest-based influence ranking model.
引用
收藏
页码:51 / 63
页数:13
相关论文
共 17 条
[1]
[Anonymous], 2007, ACM Trans. Knowl. Discov. Data
[2]
[Anonymous], 2008, Introduction to information retrieval
[3]
[Anonymous], 2010, P 3 ACM INT C WEB SE, DOI DOI 10.1145/1718487.1718520
[4]
[Anonymous], THESIS U COLORADO CO
[5]
[Anonymous], 2009, INSIDE TWITTER INDEP
[6]
[Anonymous], 2006, P WORKSH WEB MIN WEB
[7]
[Anonymous], 2011, P 1 INT S DIG FOOTPR
[8]
Benevenuto F, 2009, IMC'09: PROCEEDINGS OF THE 2009 ACM SIGCOMM INTERNET MEASUREMENT CONFERENCE, P49
[9]
Fagin R, 2003, SIAM PROC S, P28
[10]
Gilad M, 2006, P 5 INT C NAT LANG P