Automatic Labeling of semantic roles

被引:655
作者
Gildea, D
Jurafskyy, D
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Int Comp Sci Inst, Berkeley, CA 94704 USA
[3] Univ Colorado, Dept Linguist, Boulder, CO 80309 USA
[4] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
关键词
D O I
10.1162/089120102760275983
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Given an input sentence and a target word and frame, the system labels constituents with either abstract semantic roles, such as Agent or Patient, or more domain-specific semantic roles, such as Speaker, Message, and Topic. The system is based on statistical classifiers trained on roughly 50, 000 sentences that were hand-annotated with semantic roles by the FrameNet semantic labeling project. We then parsed each training sentence into a syntactic tree and extracted various lexical and syntactic features, including the phrase type of each constituent, its grammatical function, and its position in the sentence. These features were combined with knowledge of the predicate verb, noun, or adjective, as well as information such as the prior probabilities of various combinations of semantic roles. We used various lexical clustering algorithms to generalize across possible fillers of roles. Test sentences were parsed, were annotated with these features, and were then passed through the classifiers. Our system achieves 82% accuracy in identifying the semantic role of presegmented constituents. At the more difficult task of simultaneously segmenting constituents and identifying their semantic role, the system achieved 65% precision and 61% recall. Our study also allowed us to compare the usefulness of different features and feature combination methods in the semantic role labeling task. We also explore the integration of role labeling with statistical syntactic parsing and attempt to generalize to predicates unseen in the training data.
引用
收藏
页码:245 / 288
页数:44
相关论文
共 29 条
[1]  
[Anonymous], IEEE T PATTERN ANAL
[2]  
[Anonymous], P 31 ANN M ASS COMP
[3]  
[Anonymous], P 1 ANN M N AM CHAPT
[4]  
[Anonymous], P 35 ANN M ASS COMP
[5]  
Baker C.F., 1998, P 36 ANN M ASS COMP, P86, DOI DOI 10.3115/980845.980860
[6]   THEMATIC PROTO-ROLES AND ARGUMENT SELECTION [J].
DOWTY, D .
LANGUAGE, 1991, 67 (03) :547-619
[7]  
Fillmore Charles, 1968, Universals of linguistic theory, P1, DOI DOI 10.4236/ENG
[8]  
Fillmore Charles J., 1971, 22 ANN ROUND TABL M, P35
[9]   FRAME SEMANTICS AND NATURE OF LANGUAGE [J].
FILLMORE, CJ .
ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 1976, 280 (OCT28) :20-32
[10]  
Hearst MA, 1999, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, P3