Multi-level feature enhancement for real-time visual tracking
FEI Dasheng1, SONG Huihui2, ZHANG Kaihua1
1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China; 2. Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China
Abstract:In order to solve the problem of Fully-Convolutional Siamese visual tracking network (SiamFC) that the tracking target drifts when the similar semantic information interferers occur, resulting in tracking failure, a Multi-level Feature Enhanced Siamese network (MFESiam) was designed to improve the robustness of the tracker by enhancing the representation capabilities of the high-level and shallow-level features respectively. Firstly, a lightweight and effective feature fusion strategy was adopted for shallow-level features. A data enhancement technology was utilized to simulate some changes in complex scenes, such as occlusion, similarity interference and fast motion, to enhance the texture characteristics of shallow features. Secondly, for high-level features, a Pixel-aware global Contextual Attention Module (PCAM) was proposed to improve the localization ability to capture long-range dependence. Finally, many experiments were conducted on three challenging tracking benchmarks:OTB2015, GOT-10K and 2018 Visual-Object-Tracking (VOT2018). Experimental results show that the proposed algorithm has the success rate index on OTB2015 and GOT-10K better than the benchmark SiamFC by 6.3 percentage points and 4.1 percentage points respectively and runs at 45 frames per second to achieve the real-time tracking. The expected average overlap index of the proposed algorithm surpasses the champion in the VOT2018 real-time challenge, that is the high-performance Siamese with Region Proposal Network (SiamRPN), which verifies the effectiveness of the proposed algorithm.
[1] HENRIQUES J F,CASEIRO R,MARTINS P,et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(3):583-596. [2] 熊昌镇, 车满强, 王润玲. 基于稀疏卷积特征和相关滤波的实时视觉跟踪算法[J]. 计算机应用,2018,38(8):2175-2179,2333. (XIONG C Z,CHE M Q,WANG R L. Real-time visual tracking algorithm based on correlation filters and sparse convolutional features[J]. Journal of Computer Applications,2018,38(8):2175-2179,2333.) [3] 樊佳庆, 宋慧慧, 张开华. 通道稳定性加权补充学习的实时视觉跟踪算法[J]. 计算机应用,2018,38(6):1751-1754.(FAN J Q, SONG H H,ZHANG K H. Real-time visual tracking via channel stability weightedcomplementary learning[J]. Journal of Computer Applications,2018,38(6):1751-1754.) [4] 杨康, 宋慧慧, 张开华. 基于双重注意力孪生网络的实时视觉跟踪[J]. 计算机应用,2019,39(6):1652-1656.(YANG K,SONG H H,ZHANG K H. Real-time visual tracking based on dual attention Siamese network[J]. Journal of Computer Applications, 2019,39(6):1652-1656.) [5] BERTINETTO L,VALMADRE J,HENRIQUES J F,et al. Fullyconvolutional Siamese networks for object tracking[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9914. Cham:Springer,2016:850-865. [6] GUO Q,FENG W,ZHOU C,et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:1781-1789. [7] LI B,YAN J,WU W,et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8971-8980. [8] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149. [9] ZHU Z,WANG Q,LI B,et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11213. Cham:Springer, 2018:103-119. [10] HE A,LUO C,TIAN X,et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the 2018 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:4834-4843. [11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-10-16]. https://arxiv.org/pdf/1409.1556.pdf. [12] KRIZHEVSKY A, SUTSKEVR I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2012:1097-1105. [13] CAO Y,XU J,LIN S,et al. GCNet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway:IEEE,2019:1971-1980. [14] HU J,SHEN L,ALBANIE S,et al. Squeeze-and-excitation networks[EB/OL].[2017-06-05]. https://arxiv.org/pdf/1709.01507.pdf. [15] HUANG L,ZHAO X,HUANG K. GOT-10k:a large highdiversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019(Early Access):1-1. [16] RUSSAKOVSKY O,DENG J,SU H,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision,2015,115(3):211-252. [17] WU Y,LIM J,YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1834-1848. [18] KRISTAN M,LEONARDIS A,MATAS J,et al. The visual object tracking VOT2018 challenge results[C]//Proceedings of the 2018 IEEE International Conference on Computer Vision Workshop. Piscataway:IEEE,2018:1949-1972. [19] DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:4310-4318. [20] DANELLJAN M,HÄGER G,KHAN F S,et al. Accurate scale estimation for robust visual tracking[C]//Proceedings of the 2014 British Machine Vision Conference. Durham:BMVA Press, 2014:No. 038. [21] VALMADRE J,BERTINETTO L,HENRIQUES J,et al. End-toend representation learning for correlation filter based tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:5000-5008.