Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (11): 3300-3305.DOI: 10.11772/j.issn.1001-9081.2020040514

• Virtual reality and multimedia computing • Previous Articles     Next Articles

Multi-level feature enhancement for real-time visual tracking

FEI Dasheng1, SONG Huihui2, ZHANG Kaihua1   

  1. 1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China;
    2. Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China
  • Received:2020-04-23 Revised:2020-06-30 Online:2020-11-10 Published:2020-07-09
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61872189, 61876088), the Natural Science Foundation of Jiangsu Province (BK20191397, BK20170040).


费大胜1, 宋慧慧2, 张开华1   

  1. 1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044;
    2. 江苏省大气环境与装备技术协同创新中心(南京信息工程大学), 南京 210044
  • 通讯作者: 宋慧慧(1986-),女,山东聊城人,教授,博士,CCF会员,主要研究方向:遥感图像处理;
  • 作者简介:费大胜(1996-),男,江苏淮安人,硕士研究生,主要研究方向:视频单目标跟踪;张开华(1983-),男,山东日照人,教授,博士,CCF会员,主要研究方向:视频目标分割、视频目标跟踪、显著性检测
  • 基金资助:

Abstract: In order to solve the problem of Fully-Convolutional Siamese visual tracking network (SiamFC) that the tracking target drifts when the similar semantic information interferers occur, resulting in tracking failure, a Multi-level Feature Enhanced Siamese network (MFESiam) was designed to improve the robustness of the tracker by enhancing the representation capabilities of the high-level and shallow-level features respectively. Firstly, a lightweight and effective feature fusion strategy was adopted for shallow-level features. A data enhancement technology was utilized to simulate some changes in complex scenes, such as occlusion, similarity interference and fast motion, to enhance the texture characteristics of shallow features. Secondly, for high-level features, a Pixel-aware global Contextual Attention Module (PCAM) was proposed to improve the localization ability to capture long-range dependence. Finally, many experiments were conducted on three challenging tracking benchmarks:OTB2015, GOT-10K and 2018 Visual-Object-Tracking (VOT2018). Experimental results show that the proposed algorithm has the success rate index on OTB2015 and GOT-10K better than the benchmark SiamFC by 6.3 percentage points and 4.1 percentage points respectively and runs at 45 frames per second to achieve the real-time tracking. The expected average overlap index of the proposed algorithm surpasses the champion in the VOT2018 real-time challenge, that is the high-performance Siamese with Region Proposal Network (SiamRPN), which verifies the effectiveness of the proposed algorithm.

Key words: visual tracking, data enhancement, attention mechanism, global context, long-range location

摘要: 为了解决全卷积孪生视觉跟踪网络(SiamFC)出现相似语义信息干扰物使得跟踪目标发生漂移,导致跟踪失败的问题,设计出一种基于多层特征增强的实时视觉跟踪网络(MFESiam),分别去增强高层和浅层的特征表示能力,从而提升算法的鲁棒性。首先,对于浅层特征,利用一个轻量并且有效的特征融合策略,通过一种数据增强技术模拟一些在复杂场景中的变化,例如遮挡、相似物干扰、快速运动等来增强浅层特征的纹理特性;其次,对于高层特征,提出一个像素感知的全局上下文注意力机制模块(PCAM)来提高目标的长时定位能力;最后,在三个具有挑战性的跟踪基准库OTB2015、GOT-10K和2018年视觉目标跟踪库(VOT2018)上进行大量实验。实验结果表明,所提算法在OTB2015和GOT-10K上的成功率指标比基准SiamFC分别高出6.3个百分点和4.1个百分点,并且以每秒45帧的速度运行达到实时跟踪。在VOT2018实时挑战上,所提算法的平均期望重叠率指标超过2018年的冠军,即高性能的候选区域孪生视觉跟踪器(SiamRPN),验证了所提算法的有效性。

关键词: 视觉跟踪, 数据增强, 注意力机制, 全局上下文, 长时定位

CLC Number: