Mask R-CNN

被引：299

作者：

He, Kaiming ^{[1
]}

Gkioxari, Georgia ^{[1
]}

Dollar, Piotr ^{[1
]}

Girshick, Ross ^{[1
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2020年 / 42卷 / 02期

关键词：

Task analysis; Semantics; Feature extraction; Object detection; Proposals; Image segmentation; Quantization (signal); Instance segmentation; object detection; pose estimation; convolutional neural network;

D O I：

10.1109/TPAMI.2018.2844175

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron.

引用

页码：386 / 397

页数：12

共 45 条

[1] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
Andriluka, Mykhaylo
Pishchulin, Leonid
Gehler, Peter
Schiele, Bernt
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
[2] ImageNet Classification with Deep Convolutional Neural Networks
Krizhevsky, Alex
Sutskever, Ilya
Hinton, Geoffrey E.
[J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
[3] [Anonymous], 2016, ARXIV161206851
[4] Multiscale Combinatorial Grouping
Arbelaez, Pablo
Pont-Tuset, Jordi
Barron, Jonathan T.
Marques, Ferran
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 328 - 335
[5] Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Arnab, Anurag
Torr, Philip H. S.
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 879 - 888
[6] Deep Watershed Transform for Instance Segmentation
Bai, Min
Urtasun, Raquel
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2858 - 2866
[7] Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
Bell, Sean
Zitnick, C. Lawrence
Bala, Kavita
Girshick, Ross
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2874 - 2883
[8] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[9] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[10] Instance-aware Semantic Segmentation via Multi-task Network Cascades
Dai, Jifeng
He, Kaiming
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3150 - 3158

← 1 2 3 4 5 →