A Survey of Modern Authorship Attribution Methods

被引:752
作者
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Dept Informat & Commun Syst Engn, Karlovassi 83200, Samos, Greece
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2009年 / 60卷 / 03期
关键词
WRITING-STYLE; TEXT; IDENTIFICATION;
D O I
10.1002/asi.21001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Authorship attribution supported by statistical or computational methods has a long history starting from the 19th century and is marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed "Federalist Papers." During the last decade, this scientific field has been developed substantially, taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology, provided it is able to handle short and noisy text from multiple candidate authors. In this article, a survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. The focus of this survey is on computational requirements and settings rather than on linguistic or literary issues. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area.
引用
收藏
页码:538 / 556
页数:19
相关论文
共 103 条
[71]  
Madigan David, 2005, P CSNA 05
[72]  
Marton Y, 2005, LECT NOTES COMPUT SC, V3408, P300
[73]  
MATSUURA T, 2000, P 3 INT C DISC SCI, P315
[74]  
Matthews R. A. J., 1993, Literary & Linguistic Computing, V8, P203, DOI 10.1093/llc/8.4.203
[75]  
McCarthy P. M., 2006, P FLOR ART INT RES S, P764
[76]  
Mendenhall T C, 1887, Science, V9, P237, DOI 10.1126/science.ns-9.214S.237
[77]  
Merriam T. V. N., 1994, Literary & Linguistic Computing, V9, P1, DOI 10.1093/llc/9.1.1
[78]  
MIKROS G, 2007, P INT WORKSH PLAG AN, P29
[79]  
Morton A.Q., 1990, CSR390 U ED
[80]  
Peng FC, 2003, EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P267