Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (6): 1652-1656.DOI: 10.11772/j.issn.1001-9081.2018112419

• Artificial intelligence • Previous Articles     Next Articles

Real-time visual tracking based on dual attention siamese network

YANG Kang1, SONG Huihui2, ZHANG Kaihua1   

  1. 1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 211800, China;
    2. Collaborative Innovation Center of Atmospheric Environment and Equipment Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 211800, China
  • Received:2018-12-07 Revised:2019-01-10 Online:2019-06-10 Published:2019-06-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61872189, 61876088), the Natural Science Foundation of Jiangsu Province (BK20170040), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX19_0311).

基于双重注意力孪生网络的实时视觉跟踪

杨康1, 宋慧慧2, 张开华1   

  1. 1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 211800;
    2. 大气环境与装备技术协同创新中心(南京信息工程大学), 南京 211800
  • 通讯作者: 宋慧慧
  • 作者简介:杨康(1993-),男,江苏徐州人,硕士研究生,主要研究方向:目标跟踪;宋慧慧(1986-),女,山东聊城人,教授,博士,主要研究方向:遥感图像处理;张开华(1983-),男,山东日照人,教授,博士,CCF会员,主要研究方向:图像分割、目标跟踪。
  • 基金资助:
    国家自然科学基金资助项目(61872189,61876088);江苏省自然科学基金资助项目(BK20170040);江苏省研究生科研与实践创新计划项目(SJCX19_0311)。

Abstract: In order to solve the problem that Fully-Convolutional Siamese network (SiamFC) tracking algorithm is prone to model drift and results in tracking failure when the tracking target suffers from dramatic appearance changes, a new Dual Attention Siamese network (DASiam) was proposed to adapt the network model without online updating. Firstly, a modified Visual Geometry Group (VGG) network which was more expressive and suitable for the target tracking task was used as the backbone network. Then, a novel dual attention mechanism was added to the middle layer of the network to dynamically extract features. This mechanism was consisted of a channel attention mechanism and a spatial attention mechanism. The channel dimension and the spatial dimension of the feature maps were transformed to obtain the double attention feature maps. Finally, the feature representation of the model was further improved by fusing the feature maps of the two attention mechanisms. The experiments were conducted on three challenging tracking benchmarks:OTB2013, OTB100 and 2017 Visual-Object-Tracking challenge (VOT2017) real-time challenges. The experimental results show that, running at the speed of 40 frame/s, the proposed algorithm has higher success rates on OTB2013 and OTB100 than the baseline SiamFC by the margin of 3.5 percentage points and 3 percentage points respectively, and surpass the 2017 champion SiamFC in the VOT2017 real-time challenge, verifying the effectiveness of the proposed algorithm.

Key words: convolutional neural network, visual tracking, attention mechanism, siamese network

摘要: 为了解决全卷积孪生网络(SiamFC)跟踪算法在跟踪目标经历剧烈的外观变化时容易发生模型漂移从而导致跟踪失败的问题,提出了一种双重注意力机制孪生网络(DASiam)去调整网络模型并且不需要在线更新。首先,主干网络使用修改后表达能力更强的并适用于目标跟踪任务的VGG网络;然后,在网络的中间层加入一个新的双重注意力机制去动态地提取特征,这种机制由通道注意机制和空间注意机制组成,分别对特征图的通道维度和空间维度进行变换得到双重注意特征图;最后,通过融合两个注意机制的特征图进一步提升模型的表征能力。在三个具有挑战性的跟踪基准库即OTB2013、OTB100和2017年视觉目标跟踪库(VOT2017)实时挑战上进行实验,实验结果表明,以40 frame/s的速度运行时,所提算法在OTB2013和OTB100上的成功率指标比基准SiamFC分别高出3.5个百分点和3个百分点,并且在VOT2017实时挑战上面超过了2017年的冠军SiamFC,验证了所提出算法的有效性。

关键词: 卷积神经网络, 视觉跟踪, 注意力机制, 孪生网络

CLC Number: