Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

被引:45256
作者
Ren, Shaoqing [1 ]
He, Kaiming [2 ]
Girshick, Ross [3 ]
Sun, Jian [2 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Anhui, Peoples R China
[2] Microsoft Res, Visual Comp Grp, Beijing 100080, Peoples R China
[3] Facebook AI Res, Seattle, WA 98109 USA
关键词
Object detection; region proposal; convolutional neural network;
D O I
10.1109/TPAMI.2016.2577031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features-using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3], our detection system has a frame rate of 5 fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
引用
收藏
页码:1137 / 1149
页数:13
相关论文
共 40 条
  • [21] Dai JF, 2015, PROC CVPR IEEE, P3992, DOI 10.1109/CVPR.2015.7299025
  • [22] Scalable Object Detection using Deep Neural Networks
    Erhan, Dumitru
    Szegedy, Christian
    Toshev, Alexander
    Anguelov, Dragomir
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2155 - 2162
  • [23] Everingham Everingham M. M., Int. J. Comput. Vis., V88 88, P303, DOI 10.1007/s11263-009-0275-4 10.1007/s11263-009-0275-4
  • [24] Object Detection with Discriminatively Trained Part-Based Models
    Felzenszwalb, Pedro F.
    Girshick, Ross B.
    McAllester, David
    Ramanan, Deva
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (09) : 1627 - 1645
  • [25] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [26] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [27] He KM, 2014, LECT NOTES COMPUT SC, V8691, P346, DOI [arXiv:1406.4729, 10.1007/978-3-319-10578-9_23]
  • [28] Hoiem D, 2012, LECT NOTES COMPUT SC, V7574, P340, DOI 10.1007/978-3-642-33712-3_25
  • [29] What Makes for Effective Detection Proposals?
    Hosang, Jan
    Benenson, Rodrigo
    Dollar, Piotr
    Schiele, Bernt
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (04) : 814 - 830
  • [30] Caffe: Convolutional Architecture for Fast Feature Embedding
    Jia, Yangqing
    Shelhamer, Evan
    Donahue, Jeff
    Karayev, Sergey
    Long, Jonathan
    Girshick, Ross
    Guadarrama, Sergio
    Darrell, Trevor
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 675 - 678