Unsupervised learning of human action categories using spatial-temporal words

被引：955

作者：

Niebles, Juan Carlos ^{[1
,2
]}

Wang, Hongcheng ^{[3
]}

Fei-Fei, Li ^{[4
]}

机构：

[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

[2] Univ Norte, Robot & Intelligent Syst Grp, Barranquilla, Colombia

[3] United Technol Res Ctr, E Hartford, CT 06108 USA

[4] Princeton Univ, Dept Comp Sci, Princeton, NJ 08540 USA

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2008年 / 79卷 / 03期

基金：

美国国家科学基金会;

关键词：

action categorization; bag of words; spatio-temporal interest points; topic models; unsupervised learning;

D O I：

10.1007/s11263-007-0122-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.

引用

页码：299 / 318

页数：20

共 37 条

[1] [Anonymous], 2 JOINT IEEE INT WOR, DOI DOI 10.1109/VSPETS.2005.1570899
[2] [Anonymous], P 10 IEEE COMP SOC I
[3] [Anonymous], 2004, ECCV INT WORKSH STAT
[4] Blank M, 2005, IEEE I CONF COMP VIS, P1395
[5] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[6] The recognition of human movement using temporal templates
Bobick, AF
Davis, JW
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (03) : 257 - 267
[7] Boiman O, 2005, IEEE I CONF COMP VIS, P462
[8] Cheung V, 2005, PROC CVPR IEEE, P42
[9] DALAL N, 2006, ECCV, V2, P428
[10] Efros AA, 2003, NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, P726

← 1 2 3 4 →