Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

被引:1956
作者
Girshick, Ross [1 ]
Donahue, Jeff [2 ]
Darrell, Trevor [2 ]
Malik, Jitendra [2 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
Object recognition; detection; semantic segmentation; convolutional networks; deep learning; transfer learning; REPRESENTATION; HISTOGRAMS; GRADIENTS; FEATURES; SCENE;
D O I
10.1109/TPAMI.2015.2437384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/similar to rbg/rcnn.
引用
收藏
页码:142 / 158
页数:17
相关论文
共 75 条
[1]  
Agrawal P, 2014, LECT NOTES COMPUT SC, V8695, P329, DOI 10.1007/978-3-319-10584-0_22
[2]   Measuring the Objectness of Image Windows [J].
Alexe, Bogdan ;
Deselaers, Thomas ;
Ferrari, Vittorio .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2189-2202
[3]  
[Anonymous], P 10 INT C MACH LEAR
[4]  
[Anonymous], ARXIV14090575V1CSCV
[5]  
[Anonymous], 1994, 1521 AI MIT
[6]  
[Anonymous], 2009, P ACM INT C IM VID R, DOI DOI 10.1145/1646396.1646421
[7]  
[Anonymous], 2011, ADV NEURAL INF PROCE
[8]  
[Anonymous], 2015, P 3 INT C LEARN REPR
[9]  
[Anonymous], ARXIV150408083V1CSCV
[10]  
[Anonymous], PROC CVPR IEEE