Constructing Corpora for the Development and Evaluation of Paraphrase Systems

被引:36
作者
Cohn, Trevor [1 ]
Callison-Burch, Chris [2 ]
Lapata, Mirella [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
[2] Johns Hopkins Univ, Ctr Speech & Language Proc, Baltimore, MD 21218 USA
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
Natural language processing systems;
D O I
10.1162/coli.08-003-R1-07-044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically ( e. g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.
引用
收藏
页码:597 / 614
页数:18
相关论文
共 32 条
[1]
Aho A. V., 1969, J COMPUTER SYSTEM SC, V3, P37
[2]
[Anonymous], 2007, EMNLPCONLL
[3]
[Anonymous], P 38 ANN M ASS COMP
[4]
Bannard C, 2005, P 43 ANN M ASS COMP, P597, DOI DOI 10.3115/1219840.1219914
[5]
METHODS AND THEORY OF RELIABILITY [J].
BARTKO, JJ ;
CARPENTER, WT .
JOURNAL OF NERVOUS AND MENTAL DISEASE, 1976, 163 (05) :307-317
[6]
Barzilay R, 2003, PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P25
[7]
Bird S, 2006, ACL 2006 21 INT C CO
[8]
Brown P. F., 1993, Computational Linguistics, V19, P263
[9]
Callison-Burch Chris, 2006, P MAIN C HUM LANG TE, P17, DOI DOI 10.3115/1220835.1220838
[10]
COHEN J, 1960, EDUC PSYCHOL MEAS, V20, P46