Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

被引:20
作者
Wang, Ye [1 ]
Liu, Zhenyi [1 ]
Deng, Weiwen [1 ,2 ]
机构
[1] Jilin Univ, State Key Lab Automot Simulat & Control, Changchun 130025, Jilin, Peoples R China
[2] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
vehicle detection; anchor generation optimization; receptive field matching; ROI assignment; RECOGNITION;
D O I
10.3390/s19051089
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.
引用
收藏
页数:16
相关论文
共 31 条
  • [1] [Anonymous], 2015, ARXIV150205082
  • [2] [Anonymous], 2016, 160506409 ARXIV
  • [3] A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection
    Cai, Zhaowei
    Fan, Quanfu
    Feris, Rogerio S.
    Vasconcelos, Nuno
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 354 - 370
  • [4] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [5] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [6] Geiger A., 2012, C COMP VIS PATT REC
  • [7] Girshick R., 2014, IEEE COMP SOC C COMP, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81]
  • [8] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [9] An image processing system for driver assistance
    Handmann, U
    Kalinke, T
    Tzomakas, C
    Werner, M
    Seelen, W
    [J]. IMAGE AND VISION COMPUTING, 2000, 18 (05) : 367 - 376
  • [10] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]