Hybrid Task Cascade for Instance Segmentation

被引：1299

作者：

Chen, Kai ^{[1
]}

Pang, Jiangmiao ^{[2
,3
]}

Wang, Jiaqi ^{[1
]}

Xiong, Yu ^{[1
]}

Li, Xiaoxiao ^{[1
]}

Sun, Shuyang ^{[4
]}

Feng, Wansen ^{[2
]}

Liu, Ziwei ^{[1
]}

Shi, Jianping ^{[2
]}

Ouyang, Wanli ^{[4
]}

Loy, Chen Change ^{[5
]}

Lin, Dahua ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] SenseTime Res, Hong Kong, Peoples R China

[3] Zhejiang Univ, Hangzhou, Peoples R China

[4] Univ Sydney, Sydney, NSW, Australia

[5] Nanyang Technol Univ, Singapore, Singapore

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.00511

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cascade is a classic yet powerful architecture that has boosted performance on various tasks. However, how to introduce cascade to instance segmentation remains an open question. A simple combination of Cascade R-CNN and Mask R-CNN only brings limited gain. In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation. In this work, we propose a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Overall, this framework can learn more discriminative features progressively while integrating complementary features together in each stage. Without bells and whistles, a single HTC obtains 38.4% and 1.5% improvement over a strong Cascade Mask R-CNN baseline on MSCOCO dataset. Moreover, our overall system achieves 48.6 mask AP on the test-challenge split, ranking 1st in the COCO 2018 Challenge Object Detection Task.

引用

页码：4969 / 4978

页数：10

共 46 条

[21]

Komodakis N, 2016, BMVC

[22]

Li H., 2017, CVPR, P2359, DOI [DOI 10.1109/CVPR.2017.199, DOI 10.1109/CVPR.2017.472, 10.1109/CVPR.2017.472]

[23]

LI HX, 2015, PROC CVPR IEEE, P5325, DOI DOI 10.1109/CVPR.2015.7299170

[24] Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade [J].

Li, Xiaoxiao ;

Liu, Ziwei ;

Luo, Ping ;

Loy, Chen Change ;

Tang, Xiaoou .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6459-6468

[25] Microsoft COCO: Common Objects in Context [J].

Lin, Tsung-Yi ;

Maire, Michael ;

Belongie, Serge ;

Hays, James ;

Perona, Pietro ;

Ramanan, Deva ;

Dollar, Piotr ;

Zitnick, C. Lawrence .

COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755

[26]

LIN TY, 2017, PROC CVPR IEEE, P936, DOI DOI 10.1109/CVPR.2017.106

[27] Surveillance Video Parsing with Single Frame Supervision [J].

Liu, Si ;

Wang, Changhu ;

Qian, Ruihe ;

Yu, Han ;

Bao, Renda ;

Sun, Yao .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1013-1021

[28] Receptive Field Block Net for Accurate and Fast Object Detection [J].

Liu, Songtao ;

Huang, Di ;

Wang, Yunhong .

COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 :404-419

[29] SSD: Single Shot MultiBox Detector [J].

Liu, Wei ;

Anguelov, Dragomir ;

Erhan, Dumitru ;

Szegedy, Christian ;

Reed, Scott ;

Fu, Cheng-Yang ;

Berg, Alexander C. .

COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 :21-37

[30] Chained Cascade Network for Object Detection [J].

Ouyang, Wanli ;

Wang, Kun ;

Zhu, Xin ;

Wang, Xiaogang .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1956-1964

← 1 2 3 4 5 →