Dependency parsing of Turkish

被引:43
作者
Eryigit, Gulsen [1 ]
Nivre, Joakim [2 ,3 ]
Oflazer, Kemal [4 ]
机构
[1] Istanbul Tech Univ, Dept Comp Engn, TR-34469 Istanbul, Turkey
[2] Vaxjo Univ, Sch Math & Syst Engn, S-35260 Vaxjo, Sweden
[3] Uppsala Univ, Dept Linguist & Philol, S-75126 Uppsala, Sweden
[4] Sabanci Univ, Fac Engn & Nat Sci, TR-34956 Istanbul, Turkey
关键词
Natural language processing systems;
D O I
10.1162/coli.2008.07-017-R1-06-83
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called infectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.
引用
收藏
页码:357 / 389
页数:33
相关论文
共 66 条
[1]
[Anonymous], 2006, P 10 C COMP NAT LANG
[2]
ARUN A, 2005, P ACL 2005, P302
[3]
Attardi Giuseppe., 2006, P 10 C COMPUTATIONAL, P166
[4]
BAELEMANS W, 2005, MEMORY BASED LANGUAG
[5]
BICK E, 2006, P CONLL 10 NEW YORK, P171
[6]
Bikel DM., 2004, P EMNLP 2004, P182
[7]
BLACK E, 1992, P 5 DARPA SPEECH NAT, P31
[8]
The combinatory morphemic lexicon [J].
Bozsahin, C .
COMPUTATIONAL LINGUISTICS, 2002, 28 (02) :145-186
[9]
Buchholz Sabine., 2006, 10 C COMPUTATIONAL N, P149, DOI [10.3115/1596276.1596305, 10.33218/001c.13521, DOI 10.33218/001C.13521]
[10]
CAKICI R, 2006, P 5 INT TREEB LING T, P43