Region-Based Convolutional Networks for Accurate Object Detection and Segmentation

被引：1956

作者：

Girshick, Ross ^{[1
]}

Donahue, Jeff ^{[2
]}

Darrell, Trevor ^{[2
]}

Malik, Jitendra ^{[2
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 01期

基金：

美国国家科学基金会;

关键词：

Object recognition; detection; semantic segmentation; convolutional networks; deep learning; transfer learning; REPRESENTATION; HISTOGRAMS; GRADIENTS; FEATURES; SCENE;

D O I：

10.1109/TPAMI.2015.2437384

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/similar to rbg/rcnn.

引用

页码：142 / 158

页数：17

共 75 条

[41] Bottom-up Segmentation for Top-down Detection [J].

Fidler, Sanja ;

Mottaghi, Roozbeh ;

Yuille, Alan ;

Urtasun, Raquel .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :3294-3301

[42] NEOCOGNITRON - A SELF-ORGANIZING NEURAL NETWORK MODEL FOR A MECHANISM OF PATTERN-RECOGNITION UNAFFECTED BY SHIFT IN POSITION [J].

FUKUSHIMA, K .

BIOLOGICAL CYBERNETICS, 1980, 36 (04) :193-202

[43]

Gu CH, 2009, PROC CVPR IEEE, P1030, DOI 10.1109/CVPRW.2009.5206727

[44] Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].

Gupta, Saurabh ;

Girshick, Ross ;

Arbelaez, Pablo ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360

[45]

Hariharan B, 2015, PROC CVPR IEEE, P447, DOI 10.1109/CVPR.2015.7298642

[46] Simultaneous Detection and Segmentation [J].

Hariharan, Bharath ;

Arbelaez, Pablo ;

Girshick, Ross ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :297-312

[47]

Hariharan B, 2011, IEEE I CONF COMP VIS, P991, DOI 10.1109/ICCV.2011.6126343

[48]

He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]

[49]

Hoffman J., 2014, Neural Information Processing Systems NIPS, P3536

[50]

Hoiem D, 2005, IEEE I CONF COMP VIS, P654

← 1 2 3 4 5 6 7 8 →