Structural-RNN: Deep Learning on Spatio-Temporal Graphs

被引:614
作者
Jain, Ashesh [1 ,2 ]
Zamir, Amir R. [2 ]
Savarese, Silvio [2 ]
Saxena, Ashutosh [3 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Brain Things Inc, Redwood City, CA USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.573
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatio-temporal graphs are a popular tool for imposing such highlevel intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks.
引用
收藏
页码:5308 / 5317
页数:10
相关论文
共 66 条
[1]
[Anonymous], IJRR
[2]
[Anonymous], 2015, 3 INT C LEARN REPR I
[3]
[Anonymous], 2006, Advances in Neural Information Processing Systems
[4]
[Anonymous], 2013, ICML
[5]
Bengio Y., 2009, P 26 ANN INT C MACHI, P41, DOI DOI 10.1145/1553374.1553380
[6]
Bengio Yoshua, 1994, NIPS, P937
[7]
Bottou L., 1997, CVPR
[8]
Brendel W., 2011, ICCV
[9]
Byeon W., 2015, CVPR
[10]
Chen L.-C., 2014, ARXIV