Accurate unlexicalized parsing

被引:1007
作者
Klein, D [1 ]
Manning, CD [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE | 2003年
关键词
D O I
10.3115/1075096.1075150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F-1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-the-art. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.
引用
收藏
页码:423 / 430
页数:8
相关论文
共 19 条
[1]  
[Anonymous], 1965, ASPECTS THEORY SYNTA
[2]  
[Anonymous], [No title captured]
[3]  
[Anonymous], P 6 WORKSH VER LARG
[4]  
Charniak E, 1996, PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, P1031
[5]  
CHARNIAK E, 2000, NAACL, V1, P132
[6]  
CHARNIAK E, 2001, ACL, V39
[7]  
COLLINS M, 1999, THESIS U PENNSYLVANI
[8]  
COLLINS MJ, 1996, ACL, V34, P184
[9]  
EISNER J, 1999, ACL, V37, P457
[10]  
Ford Marylin, 1982, The mental representation of grammatical relations, P727