Natural language description of human activities from video images based on concept hierarchy of actions

被引:233
作者
Kojima, A
Tamura, T
Fukunaga, K
机构
[1] Univ Osaka Prefecture, Lib & Sci Informat Ctr, Osaka 5998531, Japan
[2] Univ Osaka Prefecture, Grad Sch Engn, Osaka 5998531, Japan
关键词
natural language generation; concept hierarchy; semantic primitive; position/posture estimation of human; case frame;
D O I
10.1023/A:1020346032608
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a method for describing human activities from video images based on concept hierarchies of actions. Major difficulty in transforming video images into textual descriptions is how to bridge a semantic gap between them, which is also known as inverse Hollywood problem. In general, the concepts of events or actions of human can be classified by semantic primitives. By associating these concepts with the semantic features extracted from video images, appropriate syntactic components such as verbs, objects, etc. are determined and then translated into natural language sentences. We also demonstrate the performance of the proposed method by several experiments.
引用
收藏
页码:171 / 184
页数:14
相关论文
共 18 条
[1]  
[Anonymous], 1975, Guide to Patterns and Usage in English
[2]  
Asanuma K., 1999, Transactions of the Institute of Electrical Engineers of Japan, Part C, V119-C, P1351
[3]   Monitoring human behavior from video taken in an office environment [J].
Ayers, D ;
Shah, M .
IMAGE AND VISION COMPUTING, 2001, 19 (12) :833-846
[4]  
Babaguchi N., 1996, Proceedings of the 13th International Conference on Pattern Recognition, P274, DOI 10.1109/ICPR.1996.546954
[5]  
Fellbaum C, 1998, WORDNET ELECT LEXICA
[6]  
Fillmore Charles, 1968, Universals of linguistic theory, P1, DOI DOI 10.4236/ENG
[7]  
HERZOG G, 1995, P 19 ANN GERM C ART, P257
[8]  
Intille S., 1998, 454 MIT MED LAB
[9]  
Kitahashi T, 1997, PROC INT CONF DOC, P792, DOI 10.1109/ICDAR.1997.620619
[10]  
KOJIMA A, 2000, P ICPR 2000, V4, P728