Studying User Footprints in Different Online Social Networks

被引:122
作者
Malhotra, Anshu [1 ]
Totti, Luam [2 ]
Meira, Wagner, Jr. [2 ]
Kumaraguru, Ponnurangam [1 ]
Almeida, Virgilio [2 ]
机构
[1] Indraprastha Inst Informat Technol, New Delhi, India
[2] Univ Fed Minas Gerais, Dept Comp Sci, BR-31270 Belo Horizonte, MG, Brazil
来源
2012 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM) | 2012年
关键词
D O I
10.1109/ASONAM.2012.184
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users' online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for disambiguating profiles belonging to the same user from different social networks. UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.
引用
收藏
页码:1065 / 1070
页数:6
相关论文
共 23 条
[1]  
Balduzzi M, 2010, LECT NOTES COMPUT SC, V6307, P422, DOI 10.1007/978-3-642-15512-3_22
[2]   The minimum description length principle in coding and modeling [J].
Barron, A ;
Rissanen, J ;
Yu, B .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) :2743-2760
[3]   Analysis of user keyword similarity in online social networks [J].
Bhattacharyya, Prantik ;
Garg, Ankush ;
Wu, Shyhtsun Felix .
SOCIAL NETWORK ANALYSIS AND MINING, 2011, 1 (03) :143-158
[4]  
Bilge L., ALL YOUR CONTACTS AR
[5]  
Carmagnola F., 2009, P IADIS INT C WWW IN, P129
[6]  
Carmagnola F., USER DATA DISTRIBUTE
[7]   User identification for cross-system personalisation [J].
Carmagnola, Francesca ;
Cena, Federica .
INFORMATION SCIENCES, 2009, 179 (1-2) :16-32
[8]  
FAYYAD UM, 1993, IJCAI-93, VOLS 1 AND 2, P1022
[9]  
Golbeck J., 2008, AAAI, V8, P1138
[10]  
Iofciu T., 2011, IDENTIFYING USERS SO