Rich feature hierarchies for accurate object detection and semantic segmentation

被引:12683
作者
Girshick, Ross [1 ]
Donahue, Jeff [1 ]
Darrell, Trevor [1 ]
Malik, Jitendra [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2014年
关键词
D O I
10.1109/CVPR.2014.81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012-achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. Source code for the complete system is available at http://www.cs.berkeley.edu/similar to rbg/rcnn.
引用
收藏
页码:580 / 587
页数:8
相关论文
共 37 条
  • [1] [Anonymous], 2004, IJCV
  • [2] [Anonymous], 2012, TPAMI
  • [3] [Anonymous], 2010, ECCV
  • [4] [Anonymous], 1994, 1521 AI MIT
  • [5] [Anonymous], 2012, CVPR
  • [6] [Anonymous], 2013, TPAMI
  • [7] [Anonymous], TPAMI
  • [8] [Anonymous], 2014, ICML
  • [9] [Anonymous], 2012, ECCV
  • [10] [Anonymous], NEURAL COMP