Semantic object classes in video: A high-definition ground truth database

被引:936
作者
Brostow, Gabriel J. [1 ,2 ]
Fauqueur, Julien [1 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Comp Vis Grp, Cambridge CB2 1TN, England
[2] Swiss Fed Inst Technol, Comp Vis & Geometry Grp, Zurich, Switzerland
关键词
Object recognition; Video database; Video understanding; Semantic segmentation; Label propagation; SEGMENTATION;
D O I
10.1016/j.patrec.2008.04.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual object analysis researchers are increasingly experimenting with video, because it is expected that motion cues should help with detection, recognition, and other analysis tasks. This paper presents the Cambridge-driving Labeled Video Database (CamVid) as the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. The database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over 10 min of high quality 30 Hz footage is being provided, with corresponding semantically labeled images at 1 Hz and in part, 15 Hz. The CamVid Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we present custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluate the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:88 / 97
页数:10
相关论文
共 33 条
[1]   Keyframe-based tracking for rotoscoping and animation [J].
Agarwala, A ;
Hertzmann, A ;
Salesin, DH ;
Seitz, SM .
ACM TRANSACTIONS ON GRAPHICS, 2004, 23 (03) :584-591
[2]  
[Anonymous], 2007, ACM MULTIMEDIA, DOI DOI 10.1145/1291233.1291379
[3]  
[Anonymous], The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
[4]  
[Anonymous], EUR C COMP VIS ECCV
[5]  
[Anonymous], IEEE C COMP VIS PAT, DOI DOI 10.1109/CVPR.2006.68
[6]  
BILESCHI S, 2006, MITCBCLTR2006
[7]  
Bouguet J.-Y., 2004, Camera calibration toolbox for Matlab
[8]   SEGMENTATION AND ESTIMATION OF IMAGE REGION PROPERTIES THROUGH COOPERATIVE HIERARCHIAL COMPUTATION [J].
BURT, PJ ;
HONG, TH ;
ROSENFELD, A .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1981, 11 (12) :802-809
[9]   Robust analysis of feature spaces: Color image segmentation [J].
Comaniciu, D ;
Meer, P .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :750-755
[10]  
DALAL N, 2005, IEEE COMPUT VISION P