Recognizing object manipulation activities using depth and visual cues

被引:11
作者
Liu, Haowei [1 ]
Philipose, Matthai [2 ]
Pettersson, Martin [1 ]
Sun, Ming-Ting [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Intel Labs Seattle, Seattle, WA USA
关键词
Activity recognition; Action recognition; Joint object and action recognition; HMM; Depth camera; Temporal action recognition; Temporal smoothing; Boost;
D O I
10.1016/j.jvcir.2013.03.015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
We propose a framework, consisting of several algorithms to recognize human activities that involve manipulating objects. Our proposed algorithm identifies objects being manipulated and models high-level tasks being performed accordingly. Realistic settings for such tasks pose several problems for computer vision, including sporadic occlusion by subjects, non-frontal poses, and objects with few local features. We show how size and segmentation information derived from depth data can address these challenges using simple and fast techniques. In particular, we show how to robustly and without super-vision find the manipulating hand, properly detect/recognize objects and properly use the temporal information to fill in the gaps between sporadically detected objects, all through careful inclusion of depth cues. We evaluate our approach on a challenging dataset of 12 kitchen tasks that involve 24 objects performed by 2 subjects. The entire framework yields 82%/84% precision (74%/83%recall) for task/object recognition. Our techniques outperform the state-of-the-art significantly in activity/object recognition. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:719 / 726
页数:8
相关论文
共 15 条
[1]
Andriluka M, 2009, PROC CVPR IEEE, P1014, DOI 10.1109/CVPRW.2009.5206754
[2]
[Anonymous], IEEE INT C COMP VIS
[3]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]
A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[5]
Gupta Abhinav, 2007, IEEE INT C COMP VIS
[6]
Gupta Abhinav, 2008, IEEE INT C COMP VIS
[7]
Kjellstrom Hedvig, 2008, IEEE EUR C COMP VIS
[8]
Distinctive image features from scale-invariant keypoints [J].
Lowe, DG .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 60 (02) :91-110
[9]
Micilotta AS, 2006, LECT NOTES COMPUT SC, V3953, P139, DOI 10.1007/11744078_11
[10]
Moore D. J., 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision, P80, DOI 10.1109/ICCV.1999.791201