Res2Net: A New Multi-Scale Backbone Architecture

被引：1982

作者：

Gao, Shang-Hua ^{[1
]}

Cheng, Ming-Ming ^{[1
]}

Zhao, Kai ^{[1
]}

Zhang, Xin-Yu ^{[2
]}

Yang, Ming-Hsuan ^{[1
]}

Torr, Philip ^{[3
]}

机构：

[1] Nankai Univ, Coll Comp Sci, TKLNDST, Tianjin 300350, Peoples R China

[2] UC Merced, Merced, CA 95343 USA

[3] Univ Oxford, Oxford OX1 2JD, England

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2021年 / 43卷 / 02期

基金：

英国工程与自然科学研究理事会;

关键词：

Feature extraction; Task analysis; Object detection; Semantics; Computer architecture; Kernel; Convolution; Multi-scale; deep learning; SALIENT OBJECT DETECTION;

D O I：

10.1109/TPAMI.2019.2938758

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.

引用

页码：652 / 662

页数：11

共 63 条

[1] [Anonymous], 2011, 2011 INT C COMPUTER, DOI 10.1109/ICCV.2011.6126343
[2] Shape matching and object recognition using shape contexts
Belongie, S
Malik, J
Puzicha, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 509 - 522
[3] Salient object detection: A survey
Borji, Ali
Cheng, Ming-Ming
Hou, Qibin
Jiang, Huaizu
Li, Jia
[J]. COMPUTATIONAL VISUAL MEDIA, 2019, 5 (02) : 117 - 150
[4] Salient Object Detection: A Benchmark
Borji, Ali
Sihite, Dicky N.
Itti, Laurent
[J]. COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 : 414 - 429
[5] How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks)
Bulat, Adrian
Tzimiropoulos, Georgios
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1021 - 1030
[6] Chen L-C., 2017, Rethinking atrous convolution for semantic image segmentation
[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[9] Chen Y., 2017, NEURIPSW, P4470
[10] SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation with Semi-supervised Learning
Chen, Yujin
Tu, Zhigang
Ge, Liuhao
Zhang, Dejun
Chen, Ruizhi
Yuan, Junsong
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6960 - 6969

← 1 2 3 4 5 6 7 →