Learning to Detect Human-Object Interactions with Knowledge

被引:119
作者
Xu, Bingjie [1 ]
Wong, Yongkang [1 ]
Li, Junnan [1 ]
Zhao, Qi [2 ]
Kankanhalli, Mohan S. [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] Univ Minnesota, Minneapolis, MN 55455 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR.2019.00212
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
The recent advances in instance-level detection tasks lay a strong foundation for automated visual scenes understanding. However, the ability to fully comprehend a social scene still eludes us. In this work, we focus on detecting human-object interactions (HOIs) in images, an essential step towards deeper scene understanding. HOI detection aims to localize human and objects, as well as to identify the complex interactions between them. Innate in practical problems with large label space, HOI categories exhibit a long-tail distribution, i.e., there exist some rare categories with very few training samples. Given the key observation that HOIs contain intrinsic semantic regularities despite they are visually diverse, we tackle the challenge of long-tail HOI categories by modeling the underlying regularities among verbs and objects in HOIs as well as general relationships. In particular, we construct a knowledge graph based on the ground-truth annotations of training dataset and external source. In contrast to direct knowledge incorporation, we address the necessity of dynamic image-specific knowledge retrieval by multi-modal learning, which leads to an enhanced semantic embedding space for HOI comprehension. The proposed method shows improved performance on V-COCO and HICO-DET benchmarks, especially when predicting the rare HOI categories.
引用
收藏
页码:2019 / 2028
页数:10
相关论文
共 46 条
[1]
Robust real-time unusual event detection using multiple fixed-location monitors [J].
Adam, Amit ;
Rivlin, Ehud ;
Shimshoni, Ilan ;
Reinitz, David .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (03) :555-560
[2]
[Anonymous], PROC CVPR IEEE
[3]
[Anonymous], 2014, Ecol, Approach to Vis. Percept., DOI DOI 10.4324/9781315740218
[4]
[Anonymous], 2018, BRIT MACH VIS C
[5]
[Anonymous], 2018, Advances in Neural Information Processing Systems
[6]
[Anonymous], 2017, PROC INT C LEARN REP
[7]
[Anonymous], 2018, P PART 9 SER LECT NO
[8]
[Anonymous], 2016, Sensation and perception
[9]
[Anonymous], ADV NEURAL INFORM PR, DOI DOI 10.1109/TPAMI.2016.2577031
[10]
[Anonymous], ICCV