Hybrid Task Cascade for Instance Segmentation

被引：1299

作者：

Chen, Kai ^{[1
]}

Pang, Jiangmiao ^{[2
,3
]}

Wang, Jiaqi ^{[1
]}

Xiong, Yu ^{[1
]}

Li, Xiaoxiao ^{[1
]}

Sun, Shuyang ^{[4
]}

Feng, Wansen ^{[2
]}

Liu, Ziwei ^{[1
]}

Shi, Jianping ^{[2
]}

Ouyang, Wanli ^{[4
]}

Loy, Chen Change ^{[5
]}

Lin, Dahua ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SenseTime Res, Hong Kong, Peoples R China

[3] Zhejiang Univ, Hangzhou, Peoples R China

[4] Univ Sydney, Sydney, NSW, Australia

[5] Nanyang Technol Univ, Singapore, Singapore

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00511

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4% and 1.5% improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task.

引用

页码：4969 / 4978

页数：10

共 46 条

[1]

[Anonymous], 2016, ARXIV

[2]

Arnab A, 2016, BRIT MACH VIS C

[3] Deep Watershed Transform for Instance Segmentation [J].

Bai, Min ;

Urtasun, Raquel .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2858-2866

[4] Soft-NMS - Improving Object Detection With One Line of Code [J].

Bodla, Navaneeth ;

Singh, Bharat ;

Chellappa, Rama ;

Davis, Larry S. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5562-5570

[5] COCO-Stuff: Thing and Stuff Classes in Context [J].

Caesar, Holger ;

Uijlings, Jasper ;

Ferrari, Vittorio .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1209-1218

[6]

Chen Kai, 2018, MMDETECTION

[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[8] Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Liu, Wei ;

Chang, Shih-Fu .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1043-1052

[9]

Chen Yunpeng, 2017, Advances in Neural Information Processing Systems

[10]

Dai J., 2016, ADV NEURAL INFORM PR, P379, DOI DOI 10.1109/CVPR.2017.690

← 1 2 3 4 5 →