Acquisition of Localization Confidence for Accurate Object Detection

被引：648

作者：

Jiang, Borui ^{[1
,3
]}

Luo, Ruixuan ^{[1
,3
]}

Mao, Jiayuan ^{[2
,4
]}

Xiao, Tete ^{[1
,3
]}

Jiang, Yuning ^{[4
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China

[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, ITCS, Beijing, Peoples R China

[3] Megvii Inc Face, Beijing, Peoples R China

[4] Toutiao AI Lab, Beijing, Peoples R China

来源：

COMPUTER VISION - ECCV 2018, PT XIV | 2018年 / 11218卷

关键词：

Object localization; Bounding box regression; Non-maximum suppression;

D O I：

10.1007/978-3-030-01264-9_48

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.

引用

页码：816 / 832

页数：17

共 31 条

[1] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
Bell, Sean
Zitnick, C. Lawrence
Bala, Kavita
Girshick, Ross
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2874 - 2883
[2] Bodla N., 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.593
[3] Cai Z, 2017, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2018.00644
[4] Histograms of oriented gradients for human detection
Dalal, N
Triggs, B
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
[5] Gidaris S., 2016, arXiv preprint arXiv:1606.04446
[6] Object detection via a multi-region & semantic segmentation-aware CNN model
Gidaris, Spyros
Komodakis, Nikos
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1134 - 1142
[7] Girshick R, 2015, PROC ADVNEURAL INF P
[8] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
[9] He K., 2017, P IEEE INT C COMPUTE, V2017, P2980
[10] Learning non-maximum suppression
Hosang, Jan
Benenson, Rodrigo
Schiele, Bernt
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6469 - 6477

← 1 2 3 4 →