RSC: Mining and Modeling Temporal Activity in Social Media

被引:58
作者
Costa, Alceu Ferraz [1 ]
Yamaguchi, Yuto [2 ]
Machado Traina, Agma Juci [1 ]
Traina, Caetano, Jr. [1 ]
Faloutsos, Christos [3 ]
机构
[1] Univ Sao Paulo, Dept Comp Sci, Sao Paulo, Brazil
[2] Univ Tsukuba, Tsukuba, Ibaraki, Japan
[3] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
来源
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2015年
基金
巴西圣保罗研究基金会; 美国国家科学基金会;
关键词
Social Media; Time-Series; User Behavior; Generative Model; HEAVY TAILS;
D O I
10.1145/2783258.2783294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.
引用
收藏
页码:269 / 278
页数:10
相关论文
共 28 条
[1]  
Aggarwal C. C., 2001, SIGMOD Record, V30, P37, DOI 10.1145/376284.375668
[2]  
[Anonymous], 2006, Proceedings of the ACM SIGKDD International Conference, DOI [10.1145/1150402.115042, DOI 10.1145/1150402.1150428]
[3]  
Baeza-Yates R., 1999, Modern Information Retrieval
[4]   The origin of bursts and heavy tails in human dynamics [J].
Barabási, AL .
NATURE, 2005, 435 (7039) :207-211
[5]  
Cho J., 2003, ACM Trans. Internet Technol., V3, P256, DOI DOI 10.1145/857166.857170
[6]  
Chu Z, 2010, 26TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2010), P21
[7]  
Da-Cheng Juan, 2014, Advances in Knowledge Discovery and Data Mining. 18th Pacific-Asia Conference, PAKDD 2014. Proceedings: LNCS 8444, P198, DOI 10.1007/978-3-319-06605-9_17
[8]   Cost curves: An improved method for visualizing classifier performance [J].
Drummond, Chris ;
Holte, Robert C. .
MACHINE LEARNING, 2006, 65 (01) :95-130
[9]   Entropy of dialogues creates coherent structures in e-mail traffic [J].
Eckmann, JP ;
Moses, E ;
Sergi, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (40) :14333-14337
[10]   Detecting Anomalies in Dynamic Rating Data: A Robust Probabilistic Model for Rating Evolution [J].
Gunnemann, Stephan ;
Gunnemann, Nikou ;
Faloutsos, Christos .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :841-850