Monocular Pedestrian Detection: Survey and Experiments

被引：795

作者：

Enzweiler, Markus ^{[1
]}

Gavrila, Dariu M. ^{[2
,3
]}

机构：

[1] Heidelberg Univ, Dept Math & Comp Sci, Image & Pattern Anal Grp, D-69115 Heidelberg, Germany

[2] Univ Amsterdam, Intelligent Syst Lab, Fac Sci, NL-1098 SJ Amsterdam, Netherlands

[3] Daimler AG Grp Res, Assistance Syst & Chassis, Environm Percept Dept, D-89081 Ulm, Germany

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2009年 / 31卷 / 12期

关键词：

Pedestrian detection; survey; performance analysis; benchmarking; TRACKING; MULTIPLE; IMAGES; MOTION; MODEL; RECOGNITION; COMBINATION; HUMANS; PEOPLE;

D O I：

10.1109/TPAMI.2008.260

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance, and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspectives. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75], and combined shape-texture detection [23]. Experiments are performed on an extensive data set captured onboard a vehicle driving through urban environment. The data set includes many thousands of training samples as well as a 27-minute test sequence involving more than 20,000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection onboard a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The data set (8.5 GB) is made public for benchmarking purposes.

引用

页码：2179 / 2195

页数：17

共 83 条

[1] Learning to detect objects in images via a sparse, part-based representation [J].

Agarwal, S ;

Awan, A ;

Roth, D .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (11) :1475-1490

[2]

[Anonymous], P IEEE INT VEH S

[3]

[Anonymous], P IEEE INT C COMP VI

[4]

[Anonymous], P INT C PATT REC

[5]

[Anonymous], 2006, P IEEE INT C COMP VI

[6]

[Anonymous], P EUR C COMP VIS

[7]

[Anonymous], INRIA PERS DAT

[8]

[Anonymous], P INT C COMP VIS

[9]

[Anonymous], MIT CBCL PED DAT

[10]

[Anonymous], 2008, P IEEE INT C COMP VI

← 1 2 3 4 5 6 7 8 9 →