Res2Net: A New Multi-Scale Backbone Architecture

被引:1982
作者
Gao, Shang-Hua [1 ]
Cheng, Ming-Ming [1 ]
Zhao, Kai [1 ]
Zhang, Xin-Yu [2 ]
Yang, Ming-Hsuan [1 ]
Torr, Philip [3 ]
机构
[1] Nankai Univ, Coll Comp Sci, TKLNDST, Tianjin 300350, Peoples R China
[2] UC Merced, Merced, CA 95343 USA
[3] Univ Oxford, Oxford OX1 2JD, England
基金
英国工程与自然科学研究理事会;
关键词
Feature extraction; Task analysis; Object detection; Semantics; Computer architecture; Kernel; Convolution; Multi-scale; deep learning; SALIENT OBJECT DETECTION;
D O I
10.1109/TPAMI.2019.2938758
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.
引用
收藏
页码:652 / 662
页数:11
相关论文
共 63 条
  • [1] [Anonymous], 2011, 2011 INT C COMPUTER, DOI 10.1109/ICCV.2011.6126343
  • [2] Shape matching and object recognition using shape contexts
    Belongie, S
    Malik, J
    Puzicha, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 509 - 522
  • [3] Salient object detection: A survey
    Borji, Ali
    Cheng, Ming-Ming
    Hou, Qibin
    Jiang, Huaizu
    Li, Jia
    [J]. COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) : 117 - 150
  • [4] Salient Object Detection: A Benchmark
    Borji, Ali
    Sihite, Dicky N.
    Itti, Laurent
    [J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 414 - 429
  • [5] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
    Bulat, Adrian
    Tzimiropoulos, Georgios
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1021 - 1030
  • [6] Chen L-C., 2017, Rethinking atrous convolution for semantic image segmentation
  • [7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [9] Chen Y., 2017, NEURIPSW, P4470
  • [10] SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation with Semi-supervised Learning
    Chen, Yujin
    Tu, Zhigang
    Ge, Liuhao
    Zhang, Dejun
    Chen, Ruizhi
    Yuan, Junsong
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6960 - 6969