融合层次特征和混合注意力的目标跟踪算法

doi:10.11772/j.issn.1001-9081.2021030432

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432

所属专题：人工智能

融合层次特征和混合注意力的目标跟踪算法

朱文球¹^,²(), 邹广¹^,², 曾志高¹^,²

^1.湖南工业大学计算机学院，湖南株洲 412000
^2.湖南省智能信息感知与处理技术重点实验室，湖南株洲 412000

收稿日期:2021-03-22 修回日期:2021-06-15 接受日期:2021-06-17 发布日期:2022-04-09 出版日期:2022-03-10
通讯作者: 朱文球
作者简介:邹广（1997—），男，湖南岳阳人，硕士研究生，主要研究方向：数字图像处理、目标跟踪
曾志高（1973—），男，湖南攸县人，教授，博士，主要研究方向：机器学习、数字图像处理、智能计算。
基金资助:
国家重点研发计划项目(2019QY1604);国家自然科学基金资助项目(U1836217);湖南省教育厅开放平台创新基金资助项目(20K046)

Object tracking algorithm with hierarchical features and hybrid attention

Wenqiu ZHU¹^,²(), Guang ZOU¹^,², Zhigao ZENG¹^,²

^1.School of Computer Science，Hunan University of Technology，Zhuzhou Hunan 412000，China
^2.Hunan Province Key Laboratory of Intelligent Information Perception and Processing Technology，Zhuzhou Hunan 412000，China

Received:2021-03-22 Revised:2021-06-15 Accepted:2021-06-17 Online:2022-04-09 Published:2022-03-10
Contact: Wenqiu ZHU
About author:ZOU Guang， born in 1997， M. S. candidate. His research interests include digital image processing， objection tracking.
ZENG Zhigao， born in 1973， Ph. D.， professor. His research interests include machine learning， digital image processing， intelligent computing.
Supported by:
National Key Research & Development Project of China(2019QY1604);National Natural Science Foundation of China(U1836217);Open Platform Innovation Foundation of Hunan Provincial Education Department(20K046)

摘要/Abstract

摘要：

目标跟踪任务中，全卷积孪生网络的目标跟踪（SiamFC）算法在目标遮挡、光照变化等场景时会表现出鲁棒性较差、丢失跟踪目标等问题，为此提出一种结合特征融合和注意力机制的目标跟踪算法。首先，采用ResNet50作为主干网络提取更充分的目标特征；其次，结合注意力机制对特征进行筛选，将筛选后的低层模板特征与高层模板特征分别同对应搜索特征做互相关操作后进行自适应加权融合，提升网络对正负样本的辨别力。在OTB100数据集上测试，所提算法的精度和成功率分别为81.25%和64.06%；在LaSOT数据集上测试，该算法的精度和成功率分别为49.4%和50.1%。实验结果表明，该算法目标跟踪性能优于全卷积孪生网络算法，且在处理复杂场景时有更好的鲁棒性。

关键词: 目标跟踪, 深度卷积神经网络, 层次特征融合, 注意力机制, 孪生网络

Abstract:

In object tracking tasks， Fully-Convolutional Siamese network for object tracking （SiamFC） algorithm has problems such as poor robustness and loss of tracking objects under the scenes of object occlusion and illumination variation. Therefore， an object tracking algorithm combining attention mechanism and feature fusion was proposed. Firstly， ResNet50 （Deep Residual Network） was used as the backbone network to extract more adequate object features. Secondly， attention mechanism was used to filter features. After low-level template features and high-level template features were correlated with the corresponding search features， the adaptive weighted fusion was carried out to improve the discrimination of positive and negative samples. Tested on the OTB100 （Object Tracking Benchmark） dataset， the proposed algorithm had the precision and success rate of 81.25% and 64.06%. Tested on the LaSOT （high-quality benchmark for Large-scale Single Object Tracking） dataset， the proposed algorithm had the precision and success rate of 49.4% and 50.1%. Experimental results show that the object tracking performance of the proposed algorithm is better than that of the fully convolutional Siamese network algorithm， and it has better robustness when dealing with complex scenes.

Key words: object tracking, deep convolutional neural network, hierarchical feature fusion, attention mechanism, Siamese network

中图分类号:

TP391.4

朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 计算机应用, 2022, 42(3): 833-843.

Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.

图/表 19

图1 SiamFC网络结构

Fig. 1 Network structure of SiamFC

表1 网络结构及各网络块执行的操作

Tab.1 Network structure and corresponding operation of each block

网络块名称	执行操作	模板图像大小	搜索图像大小
—	—	127×127×3	255×55×3
Block1	7×7，64，3×3maxp，s=2	31×31×64	62×62×64
Block2	$1 × 1,64 3 × 3,64 1 × 1,256 × 3$	15×15×256	31×31×256
Hybrid-Attn	—	15×15×256	31×31×256
Block3+ Dilation	$1 × 1,128 3 × 3,128 1 × 1,512 × 4$	15×15×512	31×31×512
Block4+ Dilation	$1 × 1,256 3 × 3,256 1 × 1,1 024 × 6$	15×15×1 024	31×31×1 024
Hybrid-Attn	—	15×15×1 024	31×31×1 024
Block5+ Dilation	$1 × 1,512 3 × 3,1024 1 × 1,2 048 × 3$	15×15×2 048	31×31×2 048

表1 网络结构及各网络块执行的操作

Tab.1 Network structure and corresponding operation of each block

网络块名称	执行操作	模板图像大小	搜索图像大小
—	—	127×127×3	255×55×3
Block1	7×7，64，3×3maxp，s=2	31×31×64	62×62×64
Block2	$1 × 1,64 3 × 3,64 1 × 1,256 × 3$	15×15×256	31×31×256
Hybrid-Attn	—	15×15×256	31×31×256
Block3+ Dilation	$1 × 1,128 3 × 3,128 1 × 1,512 × 4$	15×15×512	31×31×512
Block4+ Dilation	$1 × 1,256 3 × 3,256 1 × 1,1 024 × 6$	15×15×1 024	31×31×1 024
Hybrid-Attn	—	15×15×1 024	31×31×1 024
Block5+ Dilation	$1 × 1,512 3 × 3,1024 1 × 1,2 048 × 3$	15×15×2 048	31×31×2 048

图2 DeepSiamFC-Attn网络模型

Fig.2 Network model of DeepSiamFC-Attn

图3 DeepSiamFC-Attn算法流程

Fig.3 Flowchart of DeepSiamFC-Attn algorithm

图4 感受野结果

Fig.4 Receptive field results

图5 相似度计算结构

Fig.5 Structure of similarity computing

图6 混合注意力模块框架

Fig. 6 Framework of hybrid-attention module

图7 混合注意力实现模块

Fig.7 Hybrid-Attention implementation module

图8 训练过程中loss值随迭代次数变化

Fig. 8 Loss value changes with iterations during training process

图9 验证集上loss值随迭代次数变化

Fig. 9 Loss value varies with iterations on validation set

图10 OTB50数据集上算法评测结果

Fig.10 Algorithm evaluation results on OTB50 dataset

图11 OTB100数据集上算法评测结果

Fig.11 Algorithm evaluation results on OTB100 dataset

表2 LaSOT数据集上算法评测结果 ( %)

Tab. 2 Algorithm evaluation results on LaSOT dataset

算法	成功率	精度	算法	成功率	精度
DeepSiamFC-Attn	50.1	49.4	CFNet	25.8	31.2
SiamDW	39.7	43.7	SRDCF^［24］	24.5	24.8
SiamFC	38.2	42.0	Staple^［25］	24.0	27.8
Dsiam^［21］	36.2	40.5	CSR-DCF	22.4	25.4
ECO^［22］	32.9	33.8	KCF^［26］	15.6	19.0
BACF^［23］	26.3	28.3

表3 在VOT2018数据集上的评估结果

Tab. 3 Evaluation results on VOT2018 dataset

算法	准确率/%	鲁棒性/%	平均重叠率/%	平均速度/（frame·s^-1）
DeepSiamFC-Attn	57.3	30.7	28.4	52.2
DeepSRDCF^［27］	49.8	39.3	25.3	41.6
MDNet	54.5	38.6	25.7	4.6
DSiam	51.2	64.6	24.4	44.1
SiamFC	50.1	58.8	18.6	54.5
CSR-DCF	44.5	66.3	24.1	13.6
ECO	48.4	32.9	26.0	77.6
CFNet	43.7	59.4	17.8	30.5

表4 各测试序列包含的挑战属性

Tab. 4 Challenge attributes included in each test sequence

序列名称	选取的帧数	挑战属性
Bolt	24、60、124	OCC、DEF、IPR、OPR
David3	62、85、188	OCC、DEF、OPR、BC
Matrix	44、53、75	IV、SV、OCC、FM、IPR、BC
Singer2	45、210、316	IV、DEF、IPR、OPR、BC
Skating1	184、197、318	SV、OCC、DEF、OPR、BC
Walking2	197、219、241	SV、OCC、LR

图12 各算法跟踪结果定性比较

Fig. 12 Qualitative comparison of tracking results of various algorithms

表5 不同网络块组合在OTB100数据集上的实验结果对比 (%)

Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset

网络块组合	精度	成功率	网络块组合	精度	成功率
SiamFC	77.10	58.32	Block2+Block4	81.25	63.43
Block1+Block2	76.41	57.44	Block2+Block5	80.43	61.45
Block1+Block3	78.27	58.52	Block3+Block4	79.35	61.55
Block1+Block4	78.39	58.33	Block3+Block5	78.21	61.23
Block1+Block5	79.13	59.18	Block4+Block5	76.33	58.43
Block2+Block3	79.44	60.32

图13 不同混合注意力机制在OTB100数据集上结果对比

Fig. 13 Result comparison of hybrid-attention mechanism on OTB100 dataset

表6 不同混合注意力机制在VOT2018数据集上的实验结果对比

Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset

混合注意力机制	A/%↑	R/%↓	EAO/%↑	ΔEAO/%
SiamFC	50.1	58.8	18.6	—
Base	51.3	47.7	21.7	+3.1
Base+CA	53.6	37.8	24.8	+6.2
Base+SA	55.9	33.1	26.0	+7.4
Base+CA+SA	57.3	30.7	28.4	+9.8

参考文献 27

1	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision. Cham： Springer， 2016： 850-865. 10.1007/978-3-319-48881-3_56
2	VALMADRE J， BERTINETTO L， HENRIQUES J， et al. End-to-end representation learning for correlation filter based tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2017： 2805-2813. 10.1109/cvpr.2017.531
3	ZHANG Z， PENG H. Deeper and wider Siamese networks for real-time visual tracking［C］// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4591-4600. 10.1109/cvpr.2019.00472
4	LI B， YAN J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
5	WANG Q， TENG Z， XING J L， et al. Learning attentions： residual attentional Siamese network for high performance online visual tracking［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4854-4863. 10.1109/cvpr.2018.00510
6	PARK J， WOO S， LEE J Y， et al. Bam： Bottleneck attention module［EB/OL］.［2020-10-10］.. 10.1007/s11263-019-01283-0
7	费大胜，宋慧慧，张开华. 基于多层特征增强的实时视觉跟踪［J］. 计算机应用， 2020， 40（11）： 3300-3305.
	FEI D S， SONG H H， ZHANG K H. Multi-level feature enhancement for real-time visual tracking［J］. Journal of Computer Applications， 2020， 40（11）： 3300-3305.
8	李生武，张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪［J］. 计算机应用， 2020， 40（8）： 2219-2224. 10.11772/j.issn.1001-9081.2019122139
	LI S W， ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking［J］. Journal of Computer Applications， 2020， 40（8）： 2219-2224. 10.11772/j.issn.1001-9081.2019122139
9	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
10	YU F， KOLTUN V. Multi-scale context aggregation by dilated convolutions［EB/OL］. ［2020-10-10］. . 10.4236/psych.2020.1110096
11	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［EB/OL］.［2020-10-10］.. 10.1016/s0262-4079(17)32358-8
12	CHEN L， ZHANG H， XIAO J， et al. SCA-CNN： spatial and channel-wise attention in convolutional networks for image captioning［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2017： 5659-5667. 10.1109/cvpr.2017.667
13	WANG Q， WU B， ZHU P， et al. ECA-Net： efficient channel attention for deep convolutional neural networks ［C］// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539. 10.1109/cvpr42600.2020.01155
14	HUANG L， ZHAO X， HUANG K. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019，43（5）： 1562-1577.
15	WU Y， LIM J， YANG M H. Online object tracking： a benchmark［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2013： 2411-2418. 10.1109/cvpr.2013.312
16	WU Y， LIM J， YANG M H. Object tracking benchmark［J］. IEEE Transactions on Pattern Analysis Machine Intelligence，2015，37（9）：1834-1848. 10.1109/tpami.2014.2388226
17	KRISTAN M， LEONARDIS A， MATAS J， et al. The sixth visual object tracking VOT2018 challenge results［C］// Proceedings of the 2018 the European Conference on Computer Vision. Cham： Springer， 2018： 3-53.
18	FAN H， LIN L， YANG F， et al. LaSOT： a high-quality benchmark for large-scale single object tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5374-5383. 10.1109/cvpr.2019.00552
19	LUKEZIC A， VOJIR T， ZAJC L C， et al. Discriminative correlation filter with channel and spatial reliability［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6309-6318. 10.1109/cvpr.2017.515
20	NAM H， HAN B. Learning multi-domain convolutional neural networks for visual tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4293-4302. 10.1109/cvpr.2016.465
21	GUO Q， FENG W， ZHOU C， et al. Learning dynamic Siamese network for visual object tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE，2017： 1763-1771. 10.1109/iccv.2017.196
22	DANELLJAN M， BHAT G， KHAN F S， et al. ECO： Efficient convolution operators for tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6638-6646. 10.1109/cvpr.2017.733
23	GALOOGAHI H K， FAGG A， LUCEY S. Learning background-aware correlation filters for visual tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1135-1143. 10.1109/iccv.2017.129
24	DANELLJAN M， HAGER G， KHAN F S， et al. Learning spatially regularized correlation filters for visual tracking［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE，2015： 4310-4318. 10.1109/iccv.2015.490
25	BERTINETTO L， VALMADRE J， GOLODETZ S， et al. Staple： Complementary learners for real-time tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1401-1409. 10.1109/cvpr.2016.156
26	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2014， 37（3）： 583-596. 10.1109/tpami.2014.2345390
27	DANELLJAN M， HAGER G， KHAN F S， et al. Convolutional features for correlation filter based visual tracking［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE，2015： 58-66. 10.1109/iccvw.2015.84

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	姜文涛, 李宛宣, 张晟翀. 非线性时间一致性的相关滤波目标跟踪[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2558-2570.
[5]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[6]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[7]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[8]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[9]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[10]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[11]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[12]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[13]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.
[14]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[15]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.

融合层次特征和混合注意力的目标跟踪算法

Object tracking algorithm with hierarchical features and hybrid attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 27

相关文章 15

编辑推荐

Metrics