Large-scale Video Classification with Convolutional Neural Networks

被引：4126

作者：

Karpathy, Andrej ^{[1
,2
]}

Toderici, George ^{[1
]}

Shetty, Sanketh ^{[1
]}

Leung, Thomas ^{[1
]}

Sukthankar, Rahul ^{[1
]}

Fei-Fei, Li ^{[2
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2014年

关键词：

D O I：

10.1109/CVPR.2014.223

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

引用

页码：1725 / 1732

页数：8

共 28 条

[1]

[Anonymous], P EUR C COMP VIS

[2]

[Anonymous], 2009, BMVC

[3]

[Anonymous], 2012, NIPS

[4]

[Anonymous], 2005, CVPR

[5]

[Anonymous], 2014, PROC IEEE C COMPUT V

[6]

[Anonymous], 2009, CVPR

[7]

[Anonymous], 2012, ADV NEURAL INF PROCE

[8]

[Anonymous], 2010, ECCV

[9]

[Anonymous], 2011, CVPR

[10]

[Anonymous], 2013, INT C LEARN REPR ICL

← 1 2 3 →