基于自注意力机制的多域卷积神经网络的视觉追踪

doi:10.11772/j.issn.1001-9081.2019122139

计算机应用 ›› 2020, Vol. 40 ›› Issue (8): 2219-2224.DOI: 10.11772/j.issn.1001-9081.2019122139

基于自注意力机制的多域卷积神经网络的视觉追踪

李生武, 张选德

陕西科技大学电子信息与人工智能学院, 西安 710021

收稿日期:2019-12-23 修回日期:2020-03-15 发布日期:2020-05-13 出版日期:2020-08-10
通讯作者: 张选德(1979-),男,宁夏固原人,教授,博士,主要研究方向:图像质量评价、图像恢复,zhangxuande@sust.edu.cn
作者简介:李生武(1994-),男,甘肃武威人,硕士研究生,主要研究方向:视觉跟踪、深度学习。
基金资助:
国家自然科学基金资助项目（61871260）。

Multi-domain convolutional neural network based on self-attention mechanism for visual tracking

LI Shengwu, ZHANG Xuande

School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an Shaanxi 710021, China

Received:2019-12-23 Revised:2020-03-15 Online:2020-05-13 Published:2020-08-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61871260).

摘要/Abstract

摘要： 为了解决多域卷积神经网络（MDNet）在目标快速移动和外观剧烈变化时发生的模型漂移问题，提出了自注意力多域卷积神经网络（SAMDNet），通过引入自注意力机制从通道和空间两个维度来提升追踪网络的性能。首先，利用空间注意力模块将所有位置上的特征的加权总和选择性地聚合到特征图中的所有位置上，使得相似的特征彼此相关；然后，利用通道注意力模块整合所有特征图来选择性地强调互相关联的通道的重要性；最后，融合得到最终的特征图。此外，针对MDNet算法因训练数据中存在较多相似但属性不同的序列所造成的网络模型分类不准的问题，构造了复合损失函数。该复合损失函数由分类损失函数和实例判别损失函数组成，首先，用分类损失函数来统计分类的损失值；然后，利用实例判别损失函数来提高目标在当前视频序列中的权重，抑制其在其他序列中的权重；最后，融合两项损失作为模型的最终损失。在目前广泛采用的测试基准数据集OTB50和OTB2015上进行实验，结果表明所提出的算法在成功率指标上相比2015年视觉目标跟踪挑战（VOT2015）的冠军算法MDNet分别提高了1.6个百分点和1.4个百分点，在精确率和成功率指标上优于连续域卷积相关滤波（CCOT）算法，在OTB50上的精确率指标优于高效卷积操作（ECO）算法，验证了该算法的有效性。

关键词: 多域卷积神经网络, 视觉追踪, 自注意力机制, 实例判别损失, 深度学习

Abstract: In order to solve the model drift problem of Multi-Domain convolutional neural Network (MDNet) when the target moves rapidly and the appearance changes drastically, a Multi-Domain convolutional neural Network based on Self-Attention (SAMDNet) was proposed to improve the performance of the tracking network from the dimensions of channel and space by introducing the self-attention mechanism. First, the spatial attention module was used to selectively aggregate the weighted sum of features at all positions to all positions in the feature map, so that the similar features were related to each other. Then, the channel attention module was used to selectively emphasize the importance of interconnected channels by aggregating all feature maps. Finally, the final feature map was obtained by fusion. In addition, in order to solve the problem of inaccurate classification of the network model caused by the existence of many similar sequences with different attributes in training data of MDNet algorithm, a composite loss function was constructed. The composite loss function was composed of a classification loss function and an instance discriminant loss function. First of all, the classification loss function was used to calculate the classification loss value. Second, the instance discriminant loss function was used to increase the weight of the target in the current video sequence and suppress its weight in other sequences. Lastly, the two losses were fused as the final loss of the model. The experiments were conducted on two widely used testing benchmark datasets OTB50 and OTB2015. Experimental results show that the proposed algorithm improves success rate index by 1.6 percentage points and 1.4 percentage points respectively compared with the champion algorithm MDNet of the 2015 Visual-Object-Tracking challenge (VOT2015). The results also show that the precision rate and success rate of the proposed algorithm exceed those of the Continuous Convolution Operators for Visual Tracking (CCOT) algorithm, and the precision rate index of it on OTB50 is also superior to the Efficient Convolution Operators (ECO) algorithm, which verifies the effectiveness of the proposed algorithm.

Key words: Multi-Domain convolutional neural Network (MDNet), visual tracking, self-attention mechanism, instance discriminant loss, deep learning

中图分类号:

TP391.4

李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224.

LI Shengwu, ZHANG Xuande. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking[J]. Journal of Computer Applications, 2020, 40(8): 2219-2224.

参考文献

[1] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3):583-596.
[2] 樊佳庆,宋慧慧,张开华. 通道稳定性加权补充学习的实时视觉跟踪算法[J]. 计算机应用, 2018, 38(6):1751-1754. (FAN J Q, SONG H H, ZHANG K H. Real-time visual tracking via channel stability weighted complementary learning[J]. Journal of Computer Applications, 2018, 38(6):1751-1754.)
[3] 熊昌镇,车满强,王润玲. 基于稀疏卷积特征和相关滤波的实时视觉跟踪算法[J]. 计算机应用, 2018, 38(8):2175-2179. (XIONG C Z, CHE M Q, WANG R L. Real-time visual tracking algorithm based on correlation filters and sparse convolutional features[J]. Journal of Computer Applications, 2018, 38(8):2175-2179.)
[4] SONG Y, MA C, WU X, et al. VITAL:visual tracking via adversarial learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:8990-8999.
[5] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceeding of the 27th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2014:2672-2680.
[6] FAN H, LING H. SANet:structure-aware network for visual tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway:IEEE, 2017:2217-2224.
[7] PINEDA F J. Generalization of back propagation to recurrent and higher order neural networks[C]//Proceedings of the 1987 International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 1987:602-611.
[8] NAM H, BAEK M, HAN B. Modeling and propagating CNNs in a tree structure for visual tracking[EB/OL].[2019-12-20].https://arxiv.org/pdf/1608.07242.pdf.
[9] NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:4293-4302.
[10] KRISTAN M, MATAS J, LEONARDIS A, et al. The visual object tracking VOT2015 challenge results[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop. Piscataway:IEEE, 2015:564-586.
[11] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:580-587.
[12] RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc., 2017:6000-6010.
[14] ZHANG H, GOODFELLOW I J, METAXAS D N, et al. Self-attention generative adversarial networks[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR.org, 2019:7354-7363.
[15] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[16] SUNG K K, POGGIO T. Example-based learning for view based human face detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(1):39-51.
[17] WU Y, LIM J, YANG M H. Online object tracking:a benchmark[C]//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2013:2411-2418.
[18] WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1834-1848.
[19] DANELLJAN M, BHAT G, SHAHBAZ KHAN F, et al. ECO:efficient convolution operators for tracking[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:6931-6939.
[20] DANELLJAN M, ROBINSON A, SHAHBAZ KHAN F, et al. Beyond correlation filters:learning continuous convolution operators for visual tracking[C]//Proceedings of the 14th European Conference on Computer Vision. Cham:Springer, 2016:472-488.
[21] DANELLJAN M, HAGER G, SHAHBAZ KHAN F, et al. Adaptive decontamination of the training set:a unified formulation for discriminative visual tracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1430-1438.
[22] HONG S, YOU T, KWAK S, et al. Online tracking by learning discriminative saliency map with convolutional neural network[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR.org, 2015:597-606.
[23] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]//Proceedings of the 2016 European Conference on Computer Vision, LNCS 9914. Cham:Springer, 2016:850-865.
[24] WANG Q, GAO J, XING J, et, al. DCFNet:discriminant correlation filters network for visual tracking[EB/OL].[2019-12-20].https://arxiv.org/pdf/1704.04057v1.pdf.

基于自注意力机制的多域卷积神经网络的视觉追踪

Multi-domain convolutional neural network based on self-attention mechanism for visual tracking

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[4]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[5]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[6]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[7]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[8]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[9]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[10]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[11]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[12]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.
[13]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[14]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[15]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.