On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data

被引:115
作者
Carranza-Garcia, Manuel [1 ]
Torres-Mateo, Jesus [1 ]
Lara-Benitez, Pedro [1 ]
Garcia-Gutierrez, Jorge [1 ]
机构
[1] Univ Seville, Div Comp Sci, ES-41012 Seville, Spain
关键词
autonomous vehicles; convolutional neural networks; deep learning; object detection; transfer learning;
D O I
10.3390/rs13010089
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage.
引用
收藏
页码:1 / 23
页数:23
相关论文
共 46 条
  • [1] [Anonymous], 2020, REMOTE SENS BASEL, DOI DOI 10.1109/TGRS.2020.2991985
  • [2] Evaluation of deep neural networks for traffic sign detection systems
    Arcos-Garcia, Alvaro
    Alvarez-Garcia, Juan A.
    Soria-Morillo, Luis M.
    [J]. NEUROCOMPUTING, 2018, 316 : 332 - 344
  • [3] Caesar Holger, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [4] A Framework for Evaluating Land Use and Land Cover Classification Using Convolutional Neural Networks
    Carranza-Garcia, Manuel
    Garcia-Gutierrez, Jorge
    Riquelme, Jose C.
    [J]. REMOTE SENSING, 2019, 11 (03)
  • [5] Ensemble Methods for Object Detection
    Casado-Garcia, Angela
    Heras, Jonathan
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2688 - 2695
  • [6] Chen Kai, 2019, arXiv preprint arXiv:1906.07155
  • [7] Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
    Feng, Di
    Haase-Schutz, Christian
    Rosenbaum, Lars
    Hertlein, Heinz
    Glaser, Claudius
    Timm, Fabian
    Wiesbeck, Werner
    Dietmayer, Klaus
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (03) : 1341 - 1360
  • [8] Res2Net: A New Multi-Scale Backbone Architecture
    Gao, Shang-Hua
    Cheng, Ming-Ming
    Zhao, Kai
    Zhang, Xin-Yu
    Yang, Ming-Hsuan
    Torr, Philip
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) : 652 - 662
  • [9] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
  • [10] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587