Private traits and attributes are predictable from digital records of human behavior

被引:1305
作者
Kosinski, Michal [1 ]
Stillwell, David [1 ]
Graepel, Thore [2 ]
机构
[1] Univ Cambridge, Psychometr Ctr, Cambridge CB2 3RQ, England
[2] Microsoft Res, Cambridge CB1 2FB, England
关键词
social networks; computational social science; machine learning; big data; data mining; psychological assessment; PERSONALITY; LIFE; SATISFACTION;
D O I
10.1073/pnas.1218772110
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait "Openness," prediction accuracy is close to the test retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.
引用
收藏
页码:5802 / 5805
页数:4
相关论文
共 30 条
  • [1] Bachrach Y, 2012, PROCEEDINGS OF THE 3RD ANNUAL ACM WEB SCIENCE CONFERENCE, 2012, P24
  • [2] Data sharing threatens privacy
    Butler, Declan
    [J]. NATURE, 2007, 449 (7163) : 644 - 645
  • [3] Optimal Image Watermark Using Genetic Algorithm and Synergetic Neural Network
    Chen Yongqiang
    Peng Lihua
    [J]. ICICTA: 2009 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL III, PROCEEDINGS, 2009, : 209 - +
  • [4] Costa PT., 1992, NEO PI R PROFESSIONA
  • [5] Predicting Website Audience Demographics for Web Advertising Targeting Using Multi-Website Clickstream Data
    De Bock, Koen W.
    Van den Poel, Dirk
    [J]. FUNDAMENTA INFORMATICAE, 2010, 98 (01) : 49 - 70
  • [6] THE SATISFACTION WITH LIFE SCALE
    DIENER, E
    EMMONS, RA
    LARSEN, RJ
    GRIFFIN, S
    [J]. JOURNAL OF PERSONALITY ASSESSMENT, 1985, 49 (01) : 71 - 75
  • [7] Duhigg Charles., 2012, POWER HABIT WHY WE W
  • [8] Personality as manifest in word use: Correlations with self-report, acquaintance report, and behavior
    Fast, Lisa A.
    Funder, David C.
    [J]. JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2008, 94 (02) : 334 - 346
  • [9] Goel S, 2012, INT C WEBL SOC MED, P130
  • [10] Golbeck J., 2011, Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and IEEE Third International Conference on Social Computing (PASSAT/SocialCom 2011), P149, DOI 10.1109/PASSAT/SocialCom.2011.33