基于多层特征增强的实时视觉跟踪

doi:10.11772/j.issn.1001-9081.2020040514

计算机应用 ›› 2020, Vol. 40 ›› Issue (11): 3300-3305.DOI: 10.11772/j.issn.1001-9081.2020040514

• 虚拟现实与多媒体计算 • 上一篇下一篇

基于多层特征增强的实时视觉跟踪

费大胜¹, 宋慧慧², 张开华¹

1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044;
2. 江苏省大气环境与装备技术协同创新中心(南京信息工程大学), 南京 210044

收稿日期:2020-04-23 修回日期:2020-06-30 出版日期:2020-11-10 发布日期:2020-07-09
通讯作者: 宋慧慧(1986-),女,山东聊城人,教授,博士,CCF会员,主要研究方向:遥感图像处理;songhuihui@nuist.edu.cn
作者简介:费大胜(1996-),男,江苏淮安人,硕士研究生,主要研究方向:视频单目标跟踪;张开华(1983-),男,山东日照人,教授,博士,CCF会员,主要研究方向:视频目标分割、视频目标跟踪、显著性检测
基金资助:
国家自然科学基金资助项目（61872189，61876088）；江苏省自然科学基金资助项目（BK20191397，BK20170040）。

Multi-level feature enhancement for real-time visual tracking

FEI Dasheng¹, SONG Huihui², ZHANG Kaihua¹

1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China;
2. Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China

Received:2020-04-23 Revised:2020-06-30 Online:2020-11-10 Published:2020-07-09
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61872189, 61876088), the Natural Science Foundation of Jiangsu Province (BK20191397, BK20170040).

摘要/Abstract

摘要： 为了解决全卷积孪生视觉跟踪网络（SiamFC）出现相似语义信息干扰物使得跟踪目标发生漂移，导致跟踪失败的问题，设计出一种基于多层特征增强的实时视觉跟踪网络（MFESiam），分别去增强高层和浅层的特征表示能力，从而提升算法的鲁棒性。首先，对于浅层特征，利用一个轻量并且有效的特征融合策略，通过一种数据增强技术模拟一些在复杂场景中的变化，例如遮挡、相似物干扰、快速运动等来增强浅层特征的纹理特性；其次，对于高层特征，提出一个像素感知的全局上下文注意力机制模块（PCAM）来提高目标的长时定位能力；最后，在三个具有挑战性的跟踪基准库OTB2015、GOT-10K和2018年视觉目标跟踪库（VOT2018）上进行大量实验。实验结果表明，所提算法在OTB2015和GOT-10K上的成功率指标比基准SiamFC分别高出6.3个百分点和4.1个百分点，并且以每秒45帧的速度运行达到实时跟踪。在VOT2018实时挑战上，所提算法的平均期望重叠率指标超过2018年的冠军，即高性能的候选区域孪生视觉跟踪器（SiamRPN），验证了所提算法的有效性。

关键词: 视觉跟踪, 数据增强, 注意力机制, 全局上下文, 长时定位

Abstract: In order to solve the problem of Fully-Convolutional Siamese visual tracking network (SiamFC) that the tracking target drifts when the similar semantic information interferers occur, resulting in tracking failure, a Multi-level Feature Enhanced Siamese network (MFESiam) was designed to improve the robustness of the tracker by enhancing the representation capabilities of the high-level and shallow-level features respectively. Firstly, a lightweight and effective feature fusion strategy was adopted for shallow-level features. A data enhancement technology was utilized to simulate some changes in complex scenes, such as occlusion, similarity interference and fast motion, to enhance the texture characteristics of shallow features. Secondly, for high-level features, a Pixel-aware global Contextual Attention Module (PCAM) was proposed to improve the localization ability to capture long-range dependence. Finally, many experiments were conducted on three challenging tracking benchmarks:OTB2015, GOT-10K and 2018 Visual-Object-Tracking (VOT2018). Experimental results show that the proposed algorithm has the success rate index on OTB2015 and GOT-10K better than the benchmark SiamFC by 6.3 percentage points and 4.1 percentage points respectively and runs at 45 frames per second to achieve the real-time tracking. The expected average overlap index of the proposed algorithm surpasses the champion in the VOT2018 real-time challenge, that is the high-performance Siamese with Region Proposal Network (SiamRPN), which verifies the effectiveness of the proposed algorithm.

Key words: visual tracking, data enhancement, attention mechanism, global context, long-range location

中图分类号:

TP391.41

费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305.

FEI Dasheng, SONG Huihui, ZHANG Kaihua. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2020, 40(11): 3300-3305.

参考文献

[1] HENRIQUES J F,CASEIRO R,MARTINS P,et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(3):583-596.
[2] 熊昌镇, 车满强, 王润玲. 基于稀疏卷积特征和相关滤波的实时视觉跟踪算法[J]. 计算机应用,2018,38(8):2175-2179,2333. (XIONG C Z,CHE M Q,WANG R L. Real-time visual tracking algorithm based on correlation filters and sparse convolutional features[J]. Journal of Computer Applications,2018,38(8):2175-2179,2333.)
[3] 樊佳庆, 宋慧慧, 张开华. 通道稳定性加权补充学习的实时视觉跟踪算法[J]. 计算机应用,2018,38(6):1751-1754.(FAN J Q, SONG H H,ZHANG K H. Real-time visual tracking via channel stability weightedcomplementary learning[J]. Journal of Computer Applications,2018,38(6):1751-1754.)
[4] 杨康, 宋慧慧, 张开华. 基于双重注意力孪生网络的实时视觉跟踪[J]. 计算机应用,2019,39(6):1652-1656.(YANG K,SONG H H,ZHANG K H. Real-time visual tracking based on dual attention Siamese network[J]. Journal of Computer Applications, 2019,39(6):1652-1656.)
[5] BERTINETTO L,VALMADRE J,HENRIQUES J F,et al. Fullyconvolutional Siamese networks for object tracking[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9914. Cham:Springer,2016:850-865.
[6] GUO Q,FENG W,ZHOU C,et al. Learning dynamic Siamese network for visual object tracking[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:1781-1789.
[7] LI B,YAN J,WU W,et al. High performance visual tracking with siamese region proposal network[C]//Proceedings of the 2018 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8971-8980.
[8] REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards realtime object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2017, 39(6):1137-1149.
[9] ZHU Z,WANG Q,LI B,et al. Distractor-aware Siamese networks for visual object tracking[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11213. Cham:Springer, 2018:103-119.
[10] HE A,LUO C,TIAN X,et al. A twofold Siamese network for real-time object tracking[C]//Proceedings of the 2018 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:4834-4843.
[11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-10-16]. https://arxiv.org/pdf/1409.1556.pdf.
[12] KRIZHEVSKY A, SUTSKEVR I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2012:1097-1105.
[13] CAO Y,XU J,LIN S,et al. GCNet:non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. Piscataway:IEEE,2019:1971-1980.
[14] HU J,SHEN L,ALBANIE S,et al. Squeeze-and-excitation networks[EB/OL].[2017-06-05]. https://arxiv.org/pdf/1709.01507.pdf.
[15] HUANG L,ZHAO X,HUANG K. GOT-10k:a large highdiversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019(Early Access):1-1.
[16] RUSSAKOVSKY O,DENG J,SU H,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision,2015,115(3):211-252.
[17] WU Y,LIM J,YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1834-1848.
[18] KRISTAN M,LEONARDIS A,MATAS J,et al. The visual object tracking VOT2018 challenge results[C]//Proceedings of the 2018 IEEE International Conference on Computer Vision Workshop. Piscataway:IEEE,2018:1949-1972.
[19] DANELLJAN M, HÄGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:4310-4318.
[20] DANELLJAN M,HÄGER G,KHAN F S,et al. Accurate scale estimation for robust visual tracking[C]//Proceedings of the 2014 British Machine Vision Conference. Durham:BMVA Press, 2014:No. 038.
[21] VALMADRE J,BERTINETTO L,HENRIQUES J,et al. End-toend representation learning for correlation filter based tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:5000-5008.

基于多层特征增强的实时视觉跟踪

Multi-level feature enhancement for real-time visual tracking

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[2]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[3]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[4]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[5]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[6]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[7]	李朝, 兰海, 魏宪. 基于注意力的毫米波-激光雷达融合目标检测[J]. 计算机应用, 2021, 41(7): 2137-2144.
[8]	武维, 李泽平, 杨华蔚, 林川, 王忠德. 融合内容特征和时序信息的深度注意力视频流行度预测模型[J]. 计算机应用, 2021, 41(7): 1878-1884.
[9]	李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921.
[10]	张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901.
[11]	李想, 王卫兵, 尚学达. 指针生成网络和覆盖损失优化的Transformer在生成式文本摘要领域的应用[J]. 计算机应用, 2021, 41(6): 1647-1651.
[12]	刘世泽, 朱奕达, 陈润泽, 罗海勇, 赵方, 孙艺, 王宝会. 基于残差时域注意力神经网络的交通模式识别算法[J]. 计算机应用, 2021, 41(6): 1557-1565.
[13]	贾承勋, 赖华, 余正涛, 文永华, 于志强. 融合单语语言模型的汉越伪平行语料生成[J]. 计算机应用, 2021, 41(6): 1652-1658.
[14]	沈雪雯, 王晓东, 姚宇. 基于空间分频的超声图像分割注意力网络[J]. 计算机应用, 2021, 41(6): 1828-1835.
[15]	陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦. 基于注意力自身线性融合的弱监督细粒度图像分类算法[J]. 计算机应用, 2021, 41(5): 1319-1325.