Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

被引:1956
作者
Girshick, Ross [1 ]
Donahue, Jeff [2 ]
Darrell, Trevor [2 ]
Malik, Jitendra [2 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Object recognition; detection; semantic segmentation; convolutional networks; deep learning; transfer learning; REPRESENTATION; HISTOGRAMS; GRADIENTS; FEATURES; SCENE;
D O I
10.1109/TPAMI.2015.2437384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/similar to rbg/rcnn.
引用
收藏
页码:142 / 158
页数:17
相关论文
共 75 条
[41]   Bottom-up Segmentation for Top-down Detection [J].
Fidler, Sanja ;
Mottaghi, Roozbeh ;
Yuille, Alan ;
Urtasun, Raquel .
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3294-3301
[43]  
Gu CH, 2009, PROC CVPR IEEE, P1030, DOI 10.1109/CVPRW.2009.5206727
[44]   Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].
Gupta, Saurabh ;
Girshick, Ross ;
Arbelaez, Pablo ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360
[45]  
Hariharan B, 2015, PROC CVPR IEEE, P447, DOI 10.1109/CVPR.2015.7298642
[46]   Simultaneous Detection and Segmentation [J].
Hariharan, Bharath ;
Arbelaez, Pablo ;
Girshick, Ross ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :297-312
[47]  
Hariharan B, 2011, IEEE I CONF COMP VIS, P991, DOI 10.1109/ICCV.2011.6126343
[48]  
He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
[49]  
Hoffman J., 2014, Neural Information Processing Systems NIPS, P3536
[50]  
Hoiem D, 2005, IEEE I CONF COMP VIS, P654