Algorithms for Deterministic Incremental Dependency Parsing

被引:171
作者
Nivre, Joakim [1 ,2 ]
机构
[1] Vaxjo Univ, Sch Math & Syst Engn, S-35195 Vaxjo, Sweden
[2] Uppsala Univ, Dept Linguist & Philol, S-75126 Uppsala, Sweden
基金
瑞典研究理事会;
关键词
Syntactics - Computational linguistics - Natural language processing systems;
D O I
10.1162/coli.07-056-R1-07-027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Parsing algorithms that process the input from left to right and construct a single derivation have often been considered inadequate for natural language parsing because of the massive ambiguity typically found in natural language grammars. Nevertheless, it has been shown that such algorithms, combined with treebank-induced classifiers, can be used to build highly accurate disambiguating parsers, in particular for dependency-based syntactic representations. In this article, we first present a general framework for describing and analyzing algorithms for deterministic incremental dependency parsing, formalized as transition systems. We then describe and analyze two families of such algorithms: stack-based and list-based algorithms. In the former family, which is restricted to projective dependency structures, we describe an arc-eager and an arc-standard variant; in the latter family, we present a projective and a non-projective variant. For each of the four algorithms, we give proofs of correctness and complexity. In addition, we perform an experimental evaluation of all algorithms in combination with SVM classifiers for predicting the next parsing action, using data from thirteen languages. We show that all four algorithms give competitive accuracy, although the non-projective list-based algorithm generally outperforms the projective algorithms for languages with a non-negligible proportion of non-projective constructions. However, the projective algorithms often produce comparable results when combined with the technique known as pseudo-projective parsing. The linear time complexity of the stack-based algorithms gives them an advantage with respect to efficiency both in learning and in parsing, but the projective list-based algorithm turns out to be equally efficient in practice. Moreover, when the projective algorithms are used to implement pseudo-projective parsing, they sometimes become less efficient in parsing (but not in learning) than the non-projective list-based algorithm. Although most of the algorithms have been partially described in the literature before, this is the first comprehensive analysis and evaluation of the algorithms within a unified framework.
引用
收藏
页码:513 / 553
页数:41
相关论文
共 47 条
[1]
MEMORY REQUIREMENTS AND LOCAL AMBIGUITIES OF PARSING STRATEGIES [J].
ABNEY, SP ;
JOHNSON, M .
JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 1991, 20 (03) :233-250
[2]
Aho Alfred V., 1986, ADDISON WESLEY SERIE
[3]
[Anonymous], 2005, P 43 ANN M ASS COMPU, DOI DOI 10.3115/1219840.1219852
[4]
[Anonymous], 1980, THEORY SYNTACTIC REC
[5]
[Anonymous], 2006, P 10 C COMP NAT LANG
[6]
Attardi Giuseppe., 2006, P 10 C COMPUTATIONAL, P166
[7]
Bohmova A., 2003, The Prague Dependency Treebank, P103
[8]
Buchholz Sabine., 2006, 10 C COMPUTATIONAL N, P149, DOI [10.3115/1596276.1596305, 10.33218/001c.13521, DOI 10.33218/001C.13521]
[9]
LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]
CHENG YC, 2005, P INT C CHIN COMP SI, P66