Automatically Profiling the Author of an Anonymous Text

被引:180
作者
Argamon, Shlomo [1 ]
Koppel, Moshe
Pennebarker, James W. [2 ]
Schler, Jonathan
机构
[1] IIT, Chicago, IL 60616 USA
[2] Univ Texas Austin, Liberal Arts & Chair, Dept Psychol, Austin, TX 78712 USA
关键词
D O I
10.1145/1461928.1461959
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Authorship profiling problem is of growing importance in the global information environment, and can help police identify characteristics of the perpetrator of a crime when there are specific suspects to consider. The approach is to apply machine learning to text categorization, for which the corpus of training documents, each labeled according to its category for a particular profiling dimension is taken. The study outlined the kinds of text features that can be found most useful for authorship profiling. The two basic type of features include content based features, and style based features, which reflect the fact that different populations might tend to write about different topics as well as to express themselves differently about the same topic. There are four profiling problems such as determining the author's gender, age, native language, and neuroticism level for the experimental setup. The right combination of linguistic features and machine learning methods enables an automated system to effectively determine such aspects of an anonymous author.
引用
收藏
页码:119 / 123
页数:5
相关论文
共 12 条
[1]  
[Anonymous], TECHNOMETRICS
[2]  
[Anonymous], 2004, INTRO FUNCTIONAL GRA, DOI DOI 10.4324/9780203783771
[3]  
[Anonymous], TEXT
[4]  
Chambers J.K., 2004, HDB LANGUAGE VARIATI
[5]   Ultraconservative online algorithms for multiclass problems [J].
Crammer, K ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :951-991
[6]  
JUOLA P, FDN TRENDS INFORM RE, V1, P233
[7]  
KOPPEL M, 2005, P KDD CHIC IL AUG
[8]  
Koppel M., 2006, AAAI 2006 SPRING S C
[9]   From fingerprint to writeprint [J].
Li, JX ;
Zheng, R ;
Chen, HC .
COMMUNICATIONS OF THE ACM, 2006, 49 (04) :76-82
[10]   Psychological aspects of natural language use: Our words, our selves [J].
Pennebaker, JW ;
Mehl, MR ;
Niederhoffer, KG .
ANNUAL REVIEW OF PSYCHOLOGY, 2003, 54 :547-577