Head-driven statistical models for natural language parsing

被引:121
作者
Collins, M [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
D O I
10.1162/089120103322753356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.
引用
收藏
页码:589 / 637
页数:49
相关论文
共 58 条
  • [1] Alshawi H., 1994, Computational Linguistics, V20, P635
  • [2] ALSHAWI H, 1996, P 34 ANN M ASS COMP, P167
  • [3] [Anonymous], PROBLEMY DYLEMATY AN
  • [4] [Anonymous], P 1 ANN M N AM CHAPT
  • [5] BIKEL D, 2000, P STUD RES WORKSH AC
  • [6] Bikel D.M., 1997, Proceedings of the fifth conference on Applied natural language processing. Association for Computational Linguistics, P194
  • [7] BLACK E, 1992, P 5 DARPA SPEECH NAT
  • [8] BLACK E, 1991, P FEBR 1991 DARPA SP
  • [9] BOD R, 2001, P ACL 2001
  • [10] APPLYING PROBABILITY MEASURES TO ABSTRACT LANGUAGES
    BOOTH, TL
    THOMPSON, RA
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1973, C 22 (05) : 442 - 449