Surveillance Video Parsing with Single Frame Supervision

被引：35

作者：

Liu, Si ^{[1
]}

Wang, Changhu ^{[2
]}

Qian, Ruihe ^{[1
]}

Yu, Han ^{[1
]}

Bao, Renda ^{[1
]}

Sun, Yao ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, State Key Lab Informat Secur, Beijing 100093, Peoples R China

[2] Toutiao AI Lab, Beijing, Peoples R China

来源：

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR.2017.114

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [41, 8]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (i) roughly parses the frames within the video segment, (ii) estimates the optical flow between frames and (iii) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts. The collected video parsing datasets can be downloaded via http:// liusi-group. com/ projects/SVP for the further studies.

引用

页码：1013 / 1021

页数：9

共 40 条

[11] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation [J].

Dai, Jifeng ;

He, Kaiming ;

Sun, Jian .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1635-1643

[12]

Dai JF, 2015, PROC CVPR IEEE, P3992, DOI 10.1109/CVPR.2015.7299025

[13] Pedestrian Attribute Recognition At Far Distance [J].

Deng, Yubin ;

Luo, Ping ;

Loy, Chen Change ;

Tang, Xiaoou .

PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :789-792

[14] The PASCAL Visual Object Classes Challenge: A Retrospective [J].

Everingham, Mark ;

Eslami, S. M. Ali ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136

[15]

Fischer P., 2015, ICCV

[16]

Henriques Joao~ F, 2014, TPAMI

[17]

Hong S., 2015, P ADV NEUR INF PROC, P1495

[18]

JIA Y, 2014, P 22 ACM INT C MULT, DOI [DOI 10.1145/2647868.2654889, 10.1145/2647868.2654889]

[19]

Lee CY, 2015, JMLR WORKSH CONF PRO, V38, P562

[20] Deep Human Parsing with Active Template Regression [J].

Liang, Xiaodan ;

Liu, Si ;

Shen, Xiaohui ;

Yang, Jianchao ;

Liu, Luoqi ;

Dong, Jian ;

Lin, Liang ;

Yan, Shuicheng .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (12) :2402-2414

← 1 2 3 4 →