Learning to talk about events from narrated video in a construction grammar framework

被引:51
作者
Dominey, PF [1 ]
Boucher, JD [1 ]
机构
[1] CNRS, Inst Cognit Sci, F-69675 Bondy, France
关键词
grammatical construction; language acquisition; event recognition; language technology;
D O I
10.1016/j.artint.2005.06.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current research presents a system that learns to understand object names, spatial relation terms and event descriptions from observing narrated action sequences. The system extracts meaning from observed visual scenes by exploiting perceptual primitives related to motion and contact in order to represent events and spatial relations as predicate-argument structures. Learning the mapping between sentences and the predicate-argument representations of the situations they describe results in the development of a small lexicon, and a structured set of sentence form-to-meaning mappings, or simplified grammatical constructions. The acquired grammatical construction knowledge generalizes, allowing the system to correctly understand new sentences not used in training. In the context of discourse, the grammatical constructions are used in the inverse sense to generate sentences from meanings, allowing the system to describe visual scenes that it perceives. In question and answer dialogs with naive users the system exploits pragmatic cues in order to select grammatical constructions that are most relevant in the discourse structure. While the system embodies a number of limitations that are discussed, this research demonstrates how concepts borrowed from the construction grammar framework can aid in taking initial steps towards building systems that can acquire and produce event language through interaction with the world. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:31 / 61
页数:31
相关论文
共 47 条
[11]   Neural network processing of natural language: I. Sensitivity to serial, temporal and abstract structure of language in the infant [J].
Dominey, PF ;
Ramus, F .
LANGUAGE AND COGNITIVE PROCESSES, 2000, 15 (01) :87-127
[12]   Neurological basis of language and sequential cognition: Evidence from simulation, aphasia, and ERP studies [J].
Dominey, PF ;
Hoen, M ;
Blanc, JM ;
Lelekov-Boissard, T .
BRAIN AND LANGUAGE, 2003, 86 (02) :207-225
[13]  
DOMINEY PF, 2003, P IEEE C HUM ROB KAR
[14]  
Dominey PF, 2004, P COLING WORKSH PSYC, P33
[15]  
DOMINEY PF, 2005, IN PRESS COGNITIVE S
[16]  
DOMINEY PF, 2000, EVOLUTION COMMUNICAT, V4, P57, DOI DOI 10.1075/EOC.4.1.05DOM
[17]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[18]   L(0) - The first five years of an automated language acquisition project [J].
Feldman, J ;
Lakoff, G ;
Bailey, D ;
Narayanan, S ;
Regier, T ;
Stolcke, A .
ARTIFICIAL INTELLIGENCE REVIEW, 1996, 10 (1-2) :103-129
[19]  
FELDMAN JA, 1990, PROGRAM OF THE TWELFTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, P686
[20]   Specific-to-general learning for temporal events with application to learning event definitions from video [J].
Fern, A ;
Givan, R ;
Siskind, JM .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2002, 17 :379-449