Learning Deep Features for Discriminative Localization

被引:6979
作者
Zhou, Bolei [1 ]
Khosla, Aditya [1 ]
Lapedriza, Agata [1 ]
Oliva, Aude [1 ]
Torralba, Antonio [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
来源
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2016年
关键词
D O I
10.1109/CVPR.2016.319
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on imagelevel labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on an image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate in a variety of experiments that our network is able to localize the discriminative image regions despite just being trained for solving classification task1.
引用
收藏
页码:2921 / 2929
页数:9
相关论文
共 36 条
[1]
[Anonymous], ARXIV14093964
[2]
[Anonymous], Simple baseline for visual question answering
[3]
[Anonymous], P CVPR
[4]
[Anonymous], ARXIV150300949
[5]
[Anonymous], 2014, Advances in neural information processing systems
[6]
Donahue J, 2014, PR MACH LEARN RES, V32
[7]
FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[8]
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[9]
Gavves E., 2014, INT J COMPUTER VISIO
[10]
Girshick R., 2014, IEEE C COMP VIS PATT, DOI [DOI 10.1109/CVPR.2014.81, 10.1109/CVPR.2014.81]