Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

被引:908
作者
Schwartz, H. Andrew [1 ,2 ]
Eichstaedt, Johannes C. [1 ]
Kern, Margaret L. [1 ]
Dziurzynski, Lukasz [1 ]
Ramones, Stephanie M. [1 ]
Agrawal, Megha [1 ,2 ]
Shah, Achal [2 ]
Kosinski, Michal [3 ]
Stillwell, David [3 ]
Seligman, Martin E. P. [1 ]
Ungar, Lyle H. [2 ]
机构
[1] Univ Penn, Posit Psychol Ctr, Philadelphia, PA 19104 USA
[2] Univ Penn, Philadelphia, PA 19104 USA
[3] Univ Cambridge, Psychometr Ctr, Cambridge, England
来源
PLOS ONE | 2013年 / 8卷 / 09期
关键词
WORDS; LIFE; REGRESSION; POSITIONS; TAXONOMY; MODEL; POWER; WEB;
D O I
10.1371/journal.pone.0073791
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality, gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase 'sick of' and the word 'depressed'), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive 'my' when mentioning their 'wife' or 'girlfriend' more often than females use 'my' with 'husband' or 'boyfriend'). To date, this represents the largest study, by an order of magnitude, of language and personality.
引用
收藏
页数:16
相关论文
共 94 条
[1]  
Alm C. O., 2005, P HUM LANG TECHN C, P579
[2]  
[Anonymous], 2013, POLITICAL ANAL
[3]  
[Anonymous], 2010, EMNLP
[4]  
[Anonymous], 2006, P AAAI SPRING S COMP
[5]  
[Anonymous], 2007, Handbook of latent semantic analysis
[6]  
[Anonymous], 2004, 20 INT C COMP LING G
[7]   THE TRANSFORMATION OF POISSON, BINOMIAL AND NEGATIVE-BINOMIAL DATA [J].
ANSCOMBE, FJ .
BIOMETRIKA, 1948, 35 (3-4) :246-254
[8]  
Argamon S., 2005, P JOINT ANN M INTERF
[9]  
Argamon S, 2003, TEXT IN PRESS, V23, P3
[10]   Automatically Profiling the Author of an Anonymous Text [J].
Argamon, Shlomo ;
Koppel, Moshe ;
Pennebarker, James W. ;
Schler, Jonathan .
COMMUNICATIONS OF THE ACM, 2009, 52 (02) :119-123