Fame for sale: Efficient detection of fake Twitter followers

被引：238

作者：

Cresci, Stefano ^{[1
,2
]}

Di Pietro, Roberto ^{[2
,3
]}

Petrocchi, Marinella ^{[1
]}

Spognardi, Angelo ^{[1
,4
]}

Tesconi, Maurizio ^{[1
]}

机构：

[1] CNR, IIT, I-56124 Pisa, Italy

[2] Alcatel Lucent, Bell Labs, Paris, France

[3] Univ Padua, Maths Dept, Padua, Italy

[4] Tech Univ Denmark, DTU Compute, DK-2800 Lyngby, Denmark

来源：

DECISION SUPPORT SYSTEMS | 2015年 / 80卷

关键词：

Twitter; Fake followers; Anomalous account detection; Baseline dataset; Machine learning; ACCOUNTS; SYSTEM;

D O I：

10.1016/j.dss.2015.09.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fake followers are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel Class A classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier. The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers. (C) 2015 Published by Elsevier B.V.

引用

页码：56 / 71

页数：16

共 47 条

[1] A generic statistical approach for spam detection in Online Social Networks [J].

Ahmed, Faraz ;

Abulaish, Muhammad .

COMPUTER COMMUNICATIONS, 2013, 36 (10-11) :1120-1129

[2]

Alowibdi JS, 2014, 2014 PROCEEDINGS OF THE IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2014), P383, DOI 10.1109/ASONAM.2014.6921614

[3]

[Anonymous], 2013, USENIX SECURITY, DOI DOI 10.1007/S13278-015-0273-1

[4] Assessing the accuracy of prediction algorithms for classification: an overview [J].

Baldi, P ;

Brunak, S ;

Chauvin, Y ;

Andersen, CAF ;

Nielsen, H .

BIOINFORMATICS, 2000, 16 (05) :412-424

[5]

Bhat SY, 2014, COMPUT FRAUD SECUR, P8, DOI 10.1016/S1361-3723(14)70462-2

[6]

Boshmaf Y, 2011, 27TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2011), P93

[7]

Camisani-Calzolari M., 2012, ANAL TWITTER FOLOOWE

[8]

Castillo C., 2011, P 20 INT C WORLD WID, P675, DOI 10.1145/1963405.1963500

[9]

Cha M., 2010, Proceedings of the International AAAI Conference on Web and Social Media, V4, P10, DOI [DOI 10.1609/ICWSM.V4I1.14033, 10.1609/icwsm.v4i1.14033]

[10] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

← 1 2 3 4 5 →