Fusing Multiple Features for Depth-Based Action Recognition

被引:52
作者
Zhu, Yu [1 ]
Chen, Wenbin [1 ]
Guo, Guodong [1 ]
机构
[1] W Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
关键词
Algorithms; Experimentation; Performance; Human Factors; RGB-D sensor; depth maps; action recognition; spatiotemporal features; skeleton; 4D descriptor; data fusion; decision level; feature level; feature selection; CLASSIFIER FUSION;
D O I
10.1145/2629483
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Human action recognition is a very active research topic in computer vision and pattern recognition. Recently, it has shown a great potential for human action recognition using the three-dimensional (3D) depth data captured by the emerging RGB-D sensors. Several features and/or algorithms have been proposed for depth-based action recognition. A question is raised: Can we find some complementary features and combine them to improve the accuracy significantly for depth-based action recognition? To address the question and have a better understanding of the problem, we study the fusion of different features for depth-based action recognition. Although data fusion has shown great success in other areas, it has not been well studied yet on 3D action recognition. Some issues need to be addressed, for example, whether the fusion is helpful or not for depth-based action recognition, and how to do the fusion properly. In this article, we study different fusion schemes comprehensively, using diverse features for action characterization in depth videos. Two different levels of fusion schemes are investigated, that is, feature level and decision level. Various methods are explored at each fusion level. Four different features are considered to characterize the depth action patterns from different aspects. The experiments are conducted on four challenging depth action databases, in order to evaluate and find the best fusion methods generally. Our experimental results show that the four different features investigated in the article can complement each other, and appropriate fusion methods can improve the recognition accuracies significantly over each individual feature. More importantly, our fusion-based action recognition outperforms the state-of-the-art approaches on these challenging databases.
引用
收藏
页数:20
相关论文
共 52 条
[1]
Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[2]
Experimental evaluation of expert fusion strategies [J].
Alkoot, FM ;
Kittler, J .
PATTERN RECOGNITION LETTERS, 1999, 20 (11-13) :1361-1369
[3]
[Anonymous], IEEE SYST J
[4]
[Anonymous], J NANOTECHNOL
[5]
Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[6]
SURF: Speeded up robust features [J].
Bay, Herbert ;
Tuytelaars, Tinne ;
Van Gool, Luc .
COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417
[7]
Fusion of face and speech data for person identity verification [J].
Ben-Yacoub, S ;
Abdeljaoued, Y ;
Mayoraz, E .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1065-1074
[8]
Bingbing Ni, 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), P1147, DOI 10.1109/ICCVW.2011.6130379
[9]
Breiman L., 2001, J. Clin. Microbiol, V45, P5
[10]
Brown G, 2012, J MACH LEARN RES, V13, P27