Feature forest models for probabilistic HPSG parsing

被引:76
作者
Miyao, Yusuke [1 ]
Tsujii, Jun'ichi [1 ,2 ]
机构
[1] Univ Tokyo, Dept Comp Sci, Bunkyo Ku, Tokyo 1130033, Japan
[2] Univ Manchester, Manchester M13 9PL, Lancs, England
关键词
Syntactics;
D O I
10.1162/coli.2008.34.1.35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic modeling of lexicalized grammars is difficult because these grammars exploit complicated data structures, such as typed feature structures. This prevents us from applying common methods of probabilistic modeling in which a complete structure is divided into substructures under the assumption of statistical independence among sub-structures. For example, part-of-speech tagging of a sentence is decomposed into tagging of each word, and CFG parsing is split into applications of CFG rules. These methods have relied on the structure of the target problem, namely lattices or trees, and cannot be applied to graph structures including typed feature structures. This article proposes the feature forest model as a solution to the problem of probabilistic modeling of complex data structures including typed feature structures. The feature forest model provides a method for probabilistic modeling without the independence assumption when probabilistic events are represented with feature forests, Feature forests are generic data structures that represent ambiguous trees in a packed forest structure. Feature forest models are maximum entropy models defined over feature forests. A dynamic programming algorithm is proposed for maximum entropy estimation without unpacking feature forests. Thus probabilistic modeling of any data structures is possible when they are represented by feature forests. This article also describes methods for representing HPSG syntactic structures and predicate-argument structures with feature forests. Hence, we describe a complete strategy for developing probabilistic models for HPSG parsing. The effectiveness of the proposed methods is empirically evaluated through parsing experiments on the Penn Treebank, and the promise of applicability to parsing of real-world sentences is discussed.
引用
收藏
页码:35 / 80
页数:46
相关论文
共 83 条
[1]  
Abney SP, 1997, COMPUT LINGUIST, V23, P597
[2]  
[Anonymous], 2004, P 42 ANN M ASS COMP
[3]  
[Anonymous], P 3 INT C LANG RES E
[4]  
[Anonymous], [No title captured]
[5]  
[Anonymous], 1999, HEAD DRIVEN STAT MOD
[6]  
[Anonymous], 2005, P 9 INTERNATIONALWOR
[7]  
BALDRIDGE J, 2003, P ANN M N AM ASS COM, P17
[8]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[9]  
BURKE M, 2004, P IJCNLP 04 WORKSH
[10]  
Carpenter B., 1992, LOGIC TYPED FEATURE, DOI 10.1017/CBO9780511530098