Action recognition by spatio-temporal oriented energies

被引:45
作者
Zhen, Xiantong [1 ,2 ]
Shao, Ling [1 ,2 ]
Li, Xuelong [3 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Coll Elect & Informat Engn, Nanjing 210044, Jiangsu, Peoples R China
[2] Univ Sheffield, Dept Elect & Elect Engn, Sheffield S1 3JD, S Yorkshire, England
[3] Chinese Acad Sci, Xian Inst Opt & Precis Mech, State Key Lab Transient Opt & Photon, Xian 710119, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Steerable filters; Spatio-temporal oriented energies; Spatio-temporal Laplacian pyramid; MODELS;
D O I
10.1016/j.ins.2014.05.021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a unified representation based on the spatio-temporal steerable pyramid (STSP) for the holistic representation of human actions. A video sequence is viewed as a spatio-temporal volume preserving all the appearance and motion information of an action in it. By decomposing the spatio-temporal volumes into band-passed sub-volumes, the spatio-temporal Laplacian pyramid provides an effective technique for multi-scale analysis of video sequences, and spatio-temporal patterns with different scales could be well localized and captured. To efficiently explore the underlying local spatio-temporal orientation structures at multiple scales, a bank of three-dimensional separable steerable filters are conducted on each of the sub-volume from the Laplacian pyramid. The outputs of the quad-rature pair of steerable filters are squared and summed to yield a more robust oriented energy representation. To be further invariant and compact, a spatio-temporal max pooling operation is performed between responses of the filtering at adjacent scales and over spatio-temporal neighbourhoods. In order to capture the appearance, local geometric structure and motion of an action, we apply the STSP on the intensity, 3D gradients and optical flow of video sequences, yielding a unified holistic representation of human actions. Taking advantage of multi-scale, multi-orientation analysis and feature pooling, STSP produces a compact but informative and invariant representation of human actions. We conduct extensive experiments on the KTH, UCF Sports and HMDB51 datasets, which shows the unified STSP achieves comparable results with the state-of-the-art methods. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:295 / 309
页数:15
相关论文
共 55 条
[1]   SPATIOTEMPORAL ENERGY MODELS FOR THE PERCEPTION OF MOTION [J].
ADELSON, EH ;
BERGEN, JR .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (02) :284-299
[2]   Human action recognition using shape and CLG-motion flow from multi-view image sequences [J].
Ahmad, Mohiuddin ;
Lee, Seong-Whan .
PATTERN RECOGNITION, 2008, 41 (07) :2237-2252
[3]  
[Anonymous], P BMVA BRIT MACH VIS
[4]  
[Anonymous], 2010, INT C MACH LEARN
[5]  
[Anonymous], 2010, LECT NOTES COMPUT SC
[6]  
[Anonymous], NEUROCOMPUTING
[7]  
[Anonymous], 2010, EUR C COMP VIS
[8]  
[Anonymous], P 10 IEEE INT C AUT
[9]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[10]   The recognition of human movement using temporal templates [J].
Bobick, AF ;
Davis, JW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267