Surveillance Video Parsing with Single Frame Supervision

被引:35
作者
Liu, Si [1 ]
Wang, Changhu [2 ]
Qian, Ruihe [1 ]
Yu, Han [1 ]
Bao, Renda [1 ]
Sun, Yao [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100093, Peoples R China
[2] Toutiao AI Lab, Beijing, Peoples R China
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR.2017.114
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [41, 8]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (i) roughly parses the frames within the video segment, (ii) estimates the optical flow between frames and (iii) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts. The collected video parsing datasets can be downloaded via http:// liusi-group. com/ projects/SVP for the further studies.
引用
收藏
页码:1013 / 1021
页数:9
相关论文
共 40 条
[1]  
[Anonymous], ARXIV151106881
[2]  
[Anonymous], 2014, TMM
[3]  
[Anonymous], ARXIV160303911
[4]  
[Anonymous], ARXIV150602897
[5]  
[Anonymous], 2013, ACM MM
[6]  
Badrinarayanan V., 2015, SEGNET DEEP CONVOLUT, DOI DOI 10.1109/TPAMI.2016.2644615
[7]  
Bai M., 2016, ECCV
[8]   Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation [J].
Brox, Thomas ;
Malik, Jitendra .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (03) :500-513
[9]  
Chen L.-C., 2014, ARXIV
[10]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223