Acquisition of Localization Confidence for Accurate Object Detection

被引:648
作者
Jiang, Borui [1 ,3 ]
Luo, Ruixuan [1 ,3 ]
Mao, Jiayuan [2 ,4 ]
Xiao, Tete [1 ,3 ]
Jiang, Yuning [4 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, ITCS, Beijing, Peoples R China
[3] Megvii Inc Face, Beijing, Peoples R China
[4] Toutiao AI Lab, Beijing, Peoples R China
来源
COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷
关键词
Object localization; Bounding box regression; Non-maximum suppression;
D O I
10.1007/978-3-030-01264-9_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
引用
收藏
页码:816 / 832
页数:17
相关论文
共 31 条
  • [1] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
    Bell, Sean
    Zitnick, C. Lawrence
    Bala, Kavita
    Girshick, Ross
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2874 - 2883
  • [2] Bodla N., 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.593
  • [3] Cai Z, 2017, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2018.00644
  • [4] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [5] Gidaris S., 2016, arXiv preprint arXiv:1606.04446
  • [6] Object detection via a multi-region & semantic segmentation-aware CNN model
    Gidaris, Spyros
    Komodakis, Nikos
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1134 - 1142
  • [7] Girshick R, 2015, PROC ADVNEURAL INF P
  • [8] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
  • [9] He K., 2017, P IEEE INT C COMPUTE, V2017, P2980
  • [10] Learning non-maximum suppression
    Hosang, Jan
    Benenson, Rodrigo
    Schiele, Bernt
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6469 - 6477