Histogram of oriented rectangles: A new pose descriptor for human action recognition

被引：136

作者：

Ikizler, Nazli ^{[1
]}

Duygulu, Pinar ^{[1
]}

机构：

[1] Bilkent Univ, Dept Comp Engn, Ankara, Turkey

来源：

IMAGE AND VISION COMPUTING | 2009年 / 27卷 / 10期

关键词：

Action recognition; Human motion understanding; Pose descriptor; MODELS;

D O I：

10.1016/j.imavis.2009.02.002

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

Most of the approaches to human action recognition tend to form complex models which require lots of parameter estimation and computation time. In this study, we show that, human actions can be simply represented by pose without dealing with the complex representation of dynamics. Based on this idea, we propose a novel pose descriptor which we name as Histogram-of-Oriented-Rectangles (HOR) for representing and recognizing human actions in videos. We represent each human pose in an action sequence by oriented rectangular patches extracted over the human silhouette. We then form spatial oriented histograms to represent the distribution of these rectangular patches. We make use of several matching strategies to carry the information from the spatial domain described by the HOR descriptor to temporal domain. These are (i) nearest neighbor classification, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis to rectangular patches, (iii) a classifier-based approach using Support Vector Machines, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the HOR descriptor. For the cases when pose descriptor is not sufficiently strong alone, such as to differentiate actions "jogging" and "running", we also incorporate a simple velocity descriptor as a prior to the pose based classification step. We test our system with different configurations and experiment on two commonly used action datasets: the Weizmann dataset and the KTH dataset. Results show that our method is superior to other methods on Weizmann dataset with a perfect accuracy rate of 100%, and is comparable to the other methods on KTH dataset with a very high success rate close to 90%. These results prove that with a simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：1515 / 1526

页数：12

共 41 条

[1]

[Anonymous], INT C COMP VIS

[2]

[Anonymous], VIS SURV WORKSH

[3]

[Anonymous], INT C COMP VIS

[4]

Blank M, 2005, IEEE I CONF COMP VIS, P1395

[5]

The recognition of human movement using temporal templates [J].

Bobick, AF ;

Davis, JW .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) :257-267

[6]

Coupled hidden Markov models for complex action recognition [J].

Brand, M ;

Oliver, N ;

Pentland, A .

1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :994-999

[7]

Dalal N., 2005, IEEE C COMP VIS PATT

[8]

Dollar P., 2005, Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) (IEEE Cat. No. 05EX1178), P65

[9]

Efros AA, 2003, NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, P726

[10]

Fei-Fei L, 2005, PROC CVPR IEEE, P524

← 1 2 3 4 5 →