SSD: Single Shot MultiBox Detector

被引：22362

作者：

Liu, Wei ^{[1
]}

Anguelov, Dragomir ^{[2
]}

Erhan, Dumitru ^{[3
]}

Szegedy, Christian ^{[3
]}

Reed, Scott ^{[4
]}

Fu, Cheng-Yang ^{[1
]}

Berg, Alexander C. ^{[1
]}

机构：

[1] Univ N Carolina, Chapel Hill, NC 27514 USA

[2] Zoox Inc, Palo Alto, CA USA

[3] Google Inc, Mountain View, CA USA

[4] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

COMPUTER VISION - ECCV 2016, PT I | 2016年 / 9905卷

基金：

美国国家科学基金会;

关键词：

Real-time object detection; Convolutional neural network;

D O I：

10.1007/978-3-319-46448-0_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For 300 x 300 input, SSD achieves 74.3% mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 x 512 input, SSD achieves 76.9% mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size.

引用

页码：21 / 37

页数：17

共 22 条

[1]

[Anonymous], 2015, CVPR

[2]

[Anonymous], 2 INT C LEARN REPR

[3]

[Anonymous], AISTATS

[4]

[Anonymous], 2015, ICLR

[5]

[Anonymous], 2016, CVPR

[6]

[Anonymous], ILCR

[7]

[Anonymous], 2014, MM

[8]

[Anonymous], 2015, P IEEE INT C COMPUTE

[9]

[Anonymous], 2013, Some improvements on deep convolutional neural network based image classification

[10]

[Anonymous], 2013, IEEE Comput. Soc.

← 1 2 3 →