Deep neural networks for bot detection

被引:261
作者
Kudugunta, Sneha [1 ]
Ferrara, Emilio [2 ]
机构
[1] Indian Inst Technol, Hyderabad, Hyderabad, India
[2] USC Informat Sci Inst, Marina Del Rey, CA 90292 USA
关键词
Social media networks; Web and social media; Social bots; Deep learning; Deep neural networks; SOCIAL MEDIA;
D O I
10.1016/j.ins.2018.08.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of detecting bots, automated social media accounts governed by software but disguising as human users, has strong implications. For example, bots have been used to sway political elections by distorting online discourse, to manipulate the stock market, or to push anti-vaccine conspiracy theories that may have caused health epidemics. Most techniques proposed to date detect bots at the account level, by processing large amounts of social media posts, and leveraging information from network structure, temporal dynamics, sentiment analysis, etc. In this paper, we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text. Another contribution that we make is proposing a technique based on synthetic minority oversampling to generate a large labeled dataset, suitable for deep nets training, from a minimal amount of labeled data (roughly 3000 examples of sophisticated Twitter hots). We demonstrate that, from just one single tweet, our architecture can achieve high classification accuracy (AUC > 96%) in separating bots from humans. We apply the same architecture to account-level bot detection, achieving nearly perfect classification accuracy (AUC > 99%). Our system outperforms previous state of the art while leveraging a small and interpretable set of features, yet requiring minimal training data. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:312 / 322
页数:11
相关论文
共 50 条
[1]   Dissecting a Social Botnet: Growth, Content and Influence in Twitter [J].
Abokhodair, Norah ;
Yoo, Daisy ;
McDonald, David W. .
PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'15), 2015, :839-851
[2]   The Importance of Debiasing Social Media Data to Better Understand E-Cigarette-Related Attitudes and Behaviors [J].
Allem, Jon-Patrick ;
Ferrara, Emilio .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2016, 18 (08)
[3]  
[Anonymous], 2016, 1 MONDAY
[4]  
[Anonymous], 2015, VISUALIZING UNDERSTA
[5]  
[Anonymous], P 3 ACM WEB SCI C
[6]  
[Anonymous], J COMPUT SOCIAL SCI
[7]  
[Anonymous], ARXIV180204291
[8]  
[Anonymous], 2018, ARXIV180204286
[9]  
[Anonymous], 2018, ARXIV180207292
[10]  
[Anonymous], 2013, PROC 20 ANN NETW 501