A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

被引:1074
作者
Cai, Zhaowei [1 ]
Fan, Quanfu [2 ]
Feris, Rogerio S. [2 ]
Vasconcelos, Nuno [1 ]
机构
[1] Univ Calif San Diego, SVCL, San Diego, CA 92103 USA
[2] IBM TJ Watson Res, Yorktown Hts, NY USA
来源
COMPUTER VISION - ECCV 2016, PT IV | 2016年 / 9908卷
基金
美国国家科学基金会;
关键词
Object detection; Multi-scale; Unified neural network;
D O I
10.1007/978-3-319-46493-0_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 43 条
  • [31] Saberian M, 2014, J MACH LEARN RES, V15, P2569
  • [32] Shen C., 2014, ABS14095209 CORR
  • [33] Simonyan K, 2015, Arxiv, DOI arXiv:1409.1556
  • [34] Szegedy C, 2014, Arxiv, DOI [arXiv:1312.6199, DOI 10.1109/CVPR.2015.7298594]
  • [35] Deep Learning Strong Parts for Pedestrian Detection
    Tian, Yonglong
    Luo, Ping
    Wang, Xiaogang
    Tang, Xiaoou
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1904 - 1912
  • [36] van de Sande KEA, 2011, IEEE I CONF COMP VIS, P1879, DOI 10.1109/ICCV.2011.6126456
  • [37] Robust real-time face detection
    Viola, P
    Jones, MJ
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 57 (02) : 137 - 154
  • [38] Regionlets for Generic Object Detection
    Wang, Xiaoyu
    Yang, Ming
    Zhu, Shenghuo
    Lin, Yuanqing
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 17 - 24
  • [39] Xiang Y, 2015, PROC CVPR IEEE, P1903, DOI 10.1109/CVPR.2015.7298800
  • [40] Xie SN, 2015, IEEE I CONF COMP VIS, P1395, DOI [10.1109/ICCV.2015.164, 10.1007/s11263-017-1004-z]