Feature instability as a criterion for selecting potential style markers

被引:33
作者
Koppel, Moshe [1 ]
Akiva, Navot [1 ]
Dagan, Ido [1 ]
机构
[1] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
来源
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY | 2006年 / 57卷 / 11期
关键词
D O I
10.1002/asi.20428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce a new measure on linguistic features, called stability, which captures the extent to which a language element such as a word or a syntactic construct is replaceable by semantically equivalent elements. This measure may be perceived as quantifying the degree of available "synonymy" for a language item. We show that frequent, but unstable, features are especially useful as discriminators of an author's writing style.
引用
收藏
页码:1519 / 1525
页数:7
相关论文
共 14 条
  • [1] Barzilay R, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P50
  • [2] Caropreso MF, 2001, TEXT DATABASES AND DOCUMENT MANAGEMENT: THEORY AND PRACTICE, P78
  • [3] DAGAN I, 1997, P 2 C EMP METH NAT L, P55
  • [4] AUTHORSHIP ATTRIBUTION
    HOLMES, DI
    [J]. COMPUTERS AND THE HUMANITIES, 1994, 28 (02): : 87 - 106
  • [5] Koppel M., 2002, Literary & Linguistic Computing, V17, P401, DOI 10.1093/llc/17.4.401
  • [6] Lewis D., 1997, Reuters-21578 text categorization test collection, distribution 1.0
  • [7] Lewis D.D., 1994, 3 ANN S DOC AN INF R, V33, P81
  • [8] Littlestone N., 1988, Machine Learning, V2, P285, DOI 10.1007/BF00116827
  • [9] MLADENIC D, 1998, P 10 EUR C MACH LEAR, P95
  • [10] Mosteller F., 2012, APPL BAYESIAN CLASSI