Spatio-Temporal Convolutional Sparse Auto-Encoder for Sequence Classification

被引：43

作者：

Baccouche, Moez ^{[1
]}

Mamalet, Franck ^{[1
]}

Wolf, Christian ^{[2
]}

Garcia, Christophe ^{[2
]}

Baskurt, Atilla ^{[2
]}

机构：

[1] Orange Labs R&D, 4 Rue Clos Courtel, F-35510 Lyon, France

[2] Univ Lyon, CNRS INSA Lyon, LIRIS, UMR, F-69621 Villeurbanne, France

来源：

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2012 | 2012年

关键词：

SCALE;

D O I：

10.5244/C.26.124

中图分类号：

TP18 [人工智能理论];

学科分类号：

140502 [人工智能];

摘要：

We present in this paper a novel learning-based approach for video sequence classification. Contrary to the dominant methodology, which relies on hand-crafted features that are manually engineered to be optimal for a specific task, our neural model automatically learns a sparse shift-invariant representation of the local 2D + t salient information, without any use of prior knowledge. To that aim, a spatio-temporal convolutional sparse auto-encoder is trained to project a given input in a feature space, and to reconstruct it from its projection coordinates. Learning is performed in an unsupervised manner by minimizing a global parametrized objective function. The sparsity is ensured by adding a sparsifying logistic between the encoder and the decoder, while the shift-invariance is handled by including an additional hidden variable to the objective function. The temporal evolution of the obtained sparse features is learned by a long short-term memory recurrent neural network trained to classify each sequence. We show that, since the feature learning process is problem-independent, the model achieves outstanding performances when applied to two different problems, namely human action and facial expression recognition. Obtained results are superior to the state of the art on the GEMEP-FERA dataset and among the very best on the KTH dataset.

引用

页数：12

共 32 条

[1]

[Anonymous], INT C COMP VIS THEOR

[2]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390177

[3]

[Anonymous], 1997, Neural Computation, DOI 10.1109/tpami.2013.50

[4]

[Anonymous], 2009, P 26 ANN INT C MACHI, DOI DOI 10.1145/1553374.1553453

[5]

[Anonymous], 2009, Tech. Rep. CMU-CS- 09-161

[6]

Baccouche Moez, 2011, Human Behavior Unterstanding. Proceedings Second International Workshop, HBU 2011, P29, DOI 10.1007/978-3-642-25446-8_4

[7]

Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[8]

Dhall Abhinav, 2011, Proceedings 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2011), P878, DOI 10.1109/FG.2011.5771366

[9]

Dollar P., 2005, Proceedings. 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS) (IEEE Cat. No. 05EX1178), P65

[10]

Gao Z, 2010, LECT NOTES COMPUT SC, V6219, P88

← 1 2 3 4 →