MSST-ResNet: Deep multi-scale spatiotemporal features for robust visual object tracking

被引:30
作者
Liu, Bing [1 ,3 ]
Liu, Qiao [2 ]
Zhu, Zhengyu [1 ]
Zhang, Taiping [1 ]
Yang, Yong [4 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen Grad Sch, Harbin, Heilongjiang, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing, Peoples R China
[4] China Mobile IOT Co Ltd, Open Platform Dept, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual object tracking; Residual network; Kernelized correlation filter; Spatiotemporal features; Multi-scale features; FILTER; NETWORKS;
D O I
10.1016/j.knosys.2018.10.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of the tracking task directly depends on target object appearance features. Therefore, a robust method for constructing appearance features is crucial for avoiding tracking failure. The tracking methods based on Convolution Neural Network (CNN) have exhibited excellent performance in the past years. However, the features from each original convolutional layer are not robust to the size change of target object. Once the size of the target object has significant changes, the tracker drifts away from the target object. In this paper, we present a novel tracker based on multi-scale feature, spatiotemporal features and deep residual network to accurately estimate the size of the target object. Our tracker can successfully locate the target object in the consecutive video frames. To solve the multi-scale change issue in visual object tracking, we sample each input image with 67 different size templates and resize the samples to a fixed size. And then these samples are used to offline train deep residual network model with multi-scale feature that we have built up. After that spatial feature and temporal feature are fused into the deep residual network model with multi-scale feature, so that we can get deep multi-scale spatiotemporal features model, which is named MSST-ResNet feature model. Finally, MSST-ResNet feature model is transferred into the tracking tasks and combined with three different Kernelized Correlation Filters (KCFs) to accurately locate target object in the consecutive video frames. Unlike the previous trackers, we directly learn various change of the target appearance by building up a MSST-ResNet feature model. The experimental results demonstrate that the proposed tracking method outperforms the state-of-the-art tracking methods. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:235 / 252
页数:18
相关论文
共 62 条
[21]   Scalable Object Detection using Deep Neural Networks [J].
Erhan, Dumitru ;
Szegedy, Christian ;
Toshev, Alexander ;
Anguelov, Dragomir .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :2155-2162
[22]  
Feichtenhofer C., 2016, P NIPS, P1
[23]  
Hariharan B, 2015, PROC CVPR IEEE, P447, DOI 10.1109/CVPR.2015.7298642
[24]   Identity Mappings in Deep Residual Networks [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645
[25]   Robust Object Tracking via Key Patch Sparse Representation [J].
He, Zhenyu ;
Yi, Shuangyan ;
Cheung, Yiu-Ming ;
You, Xinge ;
Tang, Yuan Yan .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (02) :354-364
[26]   High-Speed Tracking with Kernelized Correlation Filters [J].
Henriques, Joao F. ;
Caseiro, Rui ;
Martins, Pedro ;
Batista, Jorge .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (03) :583-596
[27]  
Hong S, 2015, PR MACH LEARN RES, V37, P597
[28]   The Visual Object Tracking VOT2015 challenge results [J].
Kristan, Matej ;
Matas, Jiri ;
Leonardis, Ales ;
Felsberg, Michael ;
Cehovin, Luka ;
Fernandez, Gustavo ;
Vojir, Tomas ;
Hager, Gustav ;
Nebehay, Georg ;
Pflugfelder, Roman ;
Gupta, Abhinav ;
Bibi, Adel ;
Lukezic, Alan ;
Garcia-Martins, Alvaro ;
Saffari, Amir ;
Petrosino, Alfredo ;
Montero, Andres Solis ;
Varfolomieiev, Anton ;
Baskurt, Atilla ;
Zhao, Baojun ;
Ghanem, Bernard ;
Martinez, Brais ;
Lee, ByeongJu ;
Han, Bohyung ;
Wang, Chaohui ;
Garcia, Christophe ;
Zhang, Chunyuan ;
Schmid, Cordelia ;
Tao, Dacheng ;
Kim, Daijin ;
Huang, Dafei ;
Prokhorov, Danil ;
Du, Dawei ;
Yeung, Dit-Yan ;
Ribeiro, Eraldo ;
Khan, Fahad Shahbaz ;
Porikli, Fatih ;
Bunyak, Filiz ;
Zhu, Gao ;
Seetharaman, Guna ;
Kieritz, Hilke ;
Yau, Hing Tuen ;
Li, Hongdong ;
Qi, Honggang ;
Bischof, Horst ;
Possegger, Horst ;
Lee, Hyemin ;
Nam, Hyeonseob ;
Bogun, Ivan ;
Jeong, Jae-chan .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, :564-586
[29]  
LeCun Y, 2010, IEEE INT SYMP CIRC S, P253, DOI 10.1109/ISCAS.2010.5537907
[30]   A multi-view model for visual tracking via correlation filters [J].
Li, Xin ;
Liu, Qiao ;
He, Zhenyu ;
Wang, Hongpeng ;
Zhang, Chunkai ;
Chen, Wen-Sheng .
KNOWLEDGE-BASED SYSTEMS, 2016, 113 :88-99