Deep motion and appearance cues for visual tracking

被引:24
作者
Danelljan, Martin [1 ]
Bhat, Goutam [1 ]
Gladh, Susanna [1 ]
Khan, Fahad Shahbaz [1 ]
Felsberg, Michael [1 ]
机构
[1] Linkoping Univ, Linkoping, Sweden
基金
瑞典研究理事会;
关键词
Visual tracking; Deep learning; Optical flow; Discriminative correlation filters;
D O I
10.1016/j.patrec.2018.03.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generic visual tracking is a challenging computer vision problem, with numerous applications. Most existing approaches rely on appearance information by employing either hand-crafted features or deep RGB features extracted from convolutional neural networks. Despite their success, these approaches struggle in case of ambiguous appearance information, leading to tracking failure. In such cases, we argue that motion cue provides discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. In this paper, we investigate the impact of deep motion features in a tracking-by-detection framework. We also evaluate the fusion of hand-crafted, deep RGB, and deep motion features and show that they contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly demonstrate that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:74 / 81
页数:8
相关论文
共 40 条
[1]  
[Anonymous], 2015, TIP
[2]  
[Anonymous], 2015, PAMI
[3]  
[Anonymous], CVPR
[4]  
[Anonymous], 2012, CVPR
[5]  
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[6]   High accuracy optical flow estimation based on a theory for warping [J].
Brox, T ;
Bruhn, A ;
Papenberg, N ;
Weickert, J .
COMPUTER VISION - ECCV 2004, PT 4, 2004, 2034 :25-36
[7]   Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model [J].
Cehovin, Luka ;
Kristan, Matej ;
Leonardis, Ales .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (04) :941-953
[8]   P-CNN: Pose-based CNN Features for Action Recognition [J].
Cheron, Guilhem ;
Laptev, Ivan ;
Schmid, Cordelia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226
[9]  
Cimpoi M, 2015, PROC CVPR IEEE, P3828, DOI 10.1109/CVPR.2015.7299007
[10]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893