Large-scale Video Classification with Convolutional Neural Networks

被引:4126
作者
Karpathy, Andrej [1 ,2 ]
Toderici, George [1 ]
Shetty, Sanketh [1 ]
Leung, Thomas [1 ]
Sukthankar, Rahul [1 ]
Fei-Fei, Li [2 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2014年
关键词
D O I
10.1109/CVPR.2014.223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).
引用
收藏
页码:1725 / 1732
页数:8
相关论文
共 28 条
[1]  
[Anonymous], P EUR C COMP VIS
[2]  
[Anonymous], 2009, BMVC
[3]  
[Anonymous], 2012, NIPS
[4]  
[Anonymous], 2005, CVPR
[5]  
[Anonymous], 2014, PROC IEEE C COMPUT V
[6]  
[Anonymous], 2009, CVPR
[7]  
[Anonymous], 2012, ADV NEURAL INF PROCE
[8]  
[Anonymous], 2010, ECCV
[9]  
[Anonymous], 2011, CVPR
[10]  
[Anonymous], 2013, INT C LEARN REPR ICL