Object tracking algorithm with hierarchical features and hybrid attention

doi:10.11772/j.issn.1001-9081.2021030432

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Object tracking algorithm with hierarchical features and hybrid attention

Wenqiu ZHU¹^,²(), Guang ZOU¹^,², Zhigao ZENG¹^,²

^1.School of Computer Science，Hunan University of Technology，Zhuzhou Hunan 412000，China
^2.Hunan Province Key Laboratory of Intelligent Information Perception and Processing Technology，Zhuzhou Hunan 412000，China

Received:2021-03-22 Revised:2021-06-15 Accepted:2021-06-17 Online:2022-04-09 Published:2022-03-10
Contact: Wenqiu ZHU
About author:ZOU Guang， born in 1997， M. S. candidate. His research interests include digital image processing， objection tracking.
ZENG Zhigao， born in 1973， Ph. D.， professor. His research interests include machine learning， digital image processing， intelligent computing.
Supported by:
National Key Research & Development Project of China(2019QY1604);National Natural Science Foundation of China(U1836217);Open Platform Innovation Foundation of Hunan Provincial Education Department(20K046)

融合层次特征和混合注意力的目标跟踪算法

朱文球¹^,²(), 邹广¹^,², 曾志高¹^,²

^1.湖南工业大学计算机学院，湖南株洲 412000
^2.湖南省智能信息感知与处理技术重点实验室，湖南株洲 412000

通讯作者: 朱文球
作者简介:邹广（1997—），男，湖南岳阳人，硕士研究生，主要研究方向：数字图像处理、目标跟踪
曾志高（1973—），男，湖南攸县人，教授，博士，主要研究方向：机器学习、数字图像处理、智能计算。
基金资助:
国家重点研发计划项目(2019QY1604);国家自然科学基金资助项目(U1836217);湖南省教育厅开放平台创新基金资助项目(20K046)

Abstract

Abstract:

In object tracking tasks， Fully-Convolutional Siamese network for object tracking （SiamFC） algorithm has problems such as poor robustness and loss of tracking objects under the scenes of object occlusion and illumination variation. Therefore， an object tracking algorithm combining attention mechanism and feature fusion was proposed. Firstly， ResNet50 （Deep Residual Network） was used as the backbone network to extract more adequate object features. Secondly， attention mechanism was used to filter features. After low-level template features and high-level template features were correlated with the corresponding search features， the adaptive weighted fusion was carried out to improve the discrimination of positive and negative samples. Tested on the OTB100 （Object Tracking Benchmark） dataset， the proposed algorithm had the precision and success rate of 81.25% and 64.06%. Tested on the LaSOT （high-quality benchmark for Large-scale Single Object Tracking） dataset， the proposed algorithm had the precision and success rate of 49.4% and 50.1%. Experimental results show that the object tracking performance of the proposed algorithm is better than that of the fully convolutional Siamese network algorithm， and it has better robustness when dealing with complex scenes.

Key words: object tracking, deep convolutional neural network, hierarchical feature fusion, attention mechanism, Siamese network

摘要：

目标跟踪任务中，全卷积孪生网络的目标跟踪（SiamFC）算法在目标遮挡、光照变化等场景时会表现出鲁棒性较差、丢失跟踪目标等问题，为此提出一种结合特征融合和注意力机制的目标跟踪算法。首先，采用ResNet50作为主干网络提取更充分的目标特征；其次，结合注意力机制对特征进行筛选，将筛选后的低层模板特征与高层模板特征分别同对应搜索特征做互相关操作后进行自适应加权融合，提升网络对正负样本的辨别力。在OTB100数据集上测试，所提算法的精度和成功率分别为81.25%和64.06%；在LaSOT数据集上测试，该算法的精度和成功率分别为49.4%和50.1%。实验结果表明，该算法目标跟踪性能优于全卷积孪生网络算法，且在处理复杂场景时有更好的鲁棒性。

关键词: 目标跟踪, 深度卷积神经网络, 层次特征融合, 注意力机制, 孪生网络

CLC Number:

TP391.4

Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.

朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 833-843.

Figures/Tables 19

Fig. 1 Network structure of SiamFC

Tab.1 Network structure and corresponding operation of each block

网络块名称	执行操作	模板图像大小	搜索图像大小
—	—	127×127×3	255×55×3
Block1	7×7，64，3×3maxp，s=2	31×31×64	62×62×64
Block2	$1 × 1,64 3 × 3,64 1 × 1,256 × 3$	15×15×256	31×31×256
Hybrid-Attn	—	15×15×256	31×31×256
Block3+ Dilation	$1 × 1,128 3 × 3,128 1 × 1,512 × 4$	15×15×512	31×31×512
Block4+ Dilation	$1 × 1,256 3 × 3,256 1 × 1,1 024 × 6$	15×15×1 024	31×31×1 024
Hybrid-Attn	—	15×15×1 024	31×31×1 024
Block5+ Dilation	$1 × 1,512 3 × 3,1024 1 × 1,2 048 × 3$	15×15×2 048	31×31×2 048

Tab.1 Network structure and corresponding operation of each block

网络块名称	执行操作	模板图像大小	搜索图像大小
—	—	127×127×3	255×55×3
Block1	7×7，64，3×3maxp，s=2	31×31×64	62×62×64
Block2	$1 × 1,64 3 × 3,64 1 × 1,256 × 3$	15×15×256	31×31×256
Hybrid-Attn	—	15×15×256	31×31×256
Block3+ Dilation	$1 × 1,128 3 × 3,128 1 × 1,512 × 4$	15×15×512	31×31×512
Block4+ Dilation	$1 × 1,256 3 × 3,256 1 × 1,1 024 × 6$	15×15×1 024	31×31×1 024
Hybrid-Attn	—	15×15×1 024	31×31×1 024
Block5+ Dilation	$1 × 1,512 3 × 3,1024 1 × 1,2 048 × 3$	15×15×2 048	31×31×2 048

Fig.2 Network model of DeepSiamFC-Attn

Fig.3 Flowchart of DeepSiamFC-Attn algorithm

Fig.4 Receptive field results

Fig.5 Structure of similarity computing

Fig. 6 Framework of hybrid-attention module

Fig.7 Hybrid-Attention implementation module

Fig. 8 Loss value changes with iterations during training process

Fig. 9 Loss value varies with iterations on validation set

Fig.10 Algorithm evaluation results on OTB50 dataset

Fig.11 Algorithm evaluation results on OTB100 dataset

Tab. 2 Algorithm evaluation results on LaSOT dataset

算法	成功率	精度	算法	成功率	精度
DeepSiamFC-Attn	50.1	49.4	CFNet	25.8	31.2
SiamDW	39.7	43.7	SRDCF^［24］	24.5	24.8
SiamFC	38.2	42.0	Staple^［25］	24.0	27.8
Dsiam^［21］	36.2	40.5	CSR-DCF	22.4	25.4
ECO^［22］	32.9	33.8	KCF^［26］	15.6	19.0
BACF^［23］	26.3	28.3

Tab. 3 Evaluation results on VOT2018 dataset

算法	准确率/%	鲁棒性/%	平均重叠率/%	平均速度/（frame·s^-1）
DeepSiamFC-Attn	57.3	30.7	28.4	52.2
DeepSRDCF^［27］	49.8	39.3	25.3	41.6
MDNet	54.5	38.6	25.7	4.6
DSiam	51.2	64.6	24.4	44.1
SiamFC	50.1	58.8	18.6	54.5
CSR-DCF	44.5	66.3	24.1	13.6
ECO	48.4	32.9	26.0	77.6
CFNet	43.7	59.4	17.8	30.5

Tab. 4 Challenge attributes included in each test sequence

序列名称	选取的帧数	挑战属性
Bolt	24、60、124	OCC、DEF、IPR、OPR
David3	62、85、188	OCC、DEF、OPR、BC
Matrix	44、53、75	IV、SV、OCC、FM、IPR、BC
Singer2	45、210、316	IV、DEF、IPR、OPR、BC
Skating1	184、197、318	SV、OCC、DEF、OPR、BC
Walking2	197、219、241	SV、OCC、LR

Fig. 12 Qualitative comparison of tracking results of various algorithms

Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset

网络块组合	精度	成功率	网络块组合	精度	成功率
SiamFC	77.10	58.32	Block2+Block4	81.25	63.43
Block1+Block2	76.41	57.44	Block2+Block5	80.43	61.45
Block1+Block3	78.27	58.52	Block3+Block4	79.35	61.55
Block1+Block4	78.39	58.33	Block3+Block5	78.21	61.23
Block1+Block5	79.13	59.18	Block4+Block5	76.33	58.43
Block2+Block3	79.44	60.32

Fig. 13 Result comparison of hybrid-attention mechanism on OTB100 dataset

Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset

混合注意力机制	A/%↑	R/%↓	EAO/%↑	ΔEAO/%
SiamFC	50.1	58.8	18.6	—
Base	51.3	47.7	21.7	+3.1
Base+CA	53.6	37.8	24.8	+6.2
Base+SA	55.9	33.1	26.0	+7.4
Base+CA+SA	57.3	30.7	28.4	+9.8

References 27

1	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision. Cham： Springer， 2016： 850-865. 10.1007/978-3-319-48881-3_56
2	VALMADRE J， BERTINETTO L， HENRIQUES J， et al. End-to-end representation learning for correlation filter based tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2017： 2805-2813. 10.1109/cvpr.2017.531
3	ZHANG Z， PENG H. Deeper and wider Siamese networks for real-time visual tracking［C］// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4591-4600. 10.1109/cvpr.2019.00472
4	LI B， YAN J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
5	WANG Q， TENG Z， XING J L， et al. Learning attentions： residual attentional Siamese network for high performance online visual tracking［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4854-4863. 10.1109/cvpr.2018.00510
6	PARK J， WOO S， LEE J Y， et al. Bam： Bottleneck attention module［EB/OL］.［2020-10-10］.. 10.1007/s11263-019-01283-0
7	费大胜，宋慧慧，张开华. 基于多层特征增强的实时视觉跟踪［J］. 计算机应用， 2020， 40（11）： 3300-3305.
	FEI D S， SONG H H， ZHANG K H. Multi-level feature enhancement for real-time visual tracking［J］. Journal of Computer Applications， 2020， 40（11）： 3300-3305.
8	李生武，张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪［J］. 计算机应用， 2020， 40（8）： 2219-2224. 10.11772/j.issn.1001-9081.2019122139
	LI S W， ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking［J］. Journal of Computer Applications， 2020， 40（8）： 2219-2224. 10.11772/j.issn.1001-9081.2019122139
9	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
10	YU F， KOLTUN V. Multi-scale context aggregation by dilated convolutions［EB/OL］. ［2020-10-10］. . 10.4236/psych.2020.1110096
11	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［EB/OL］.［2020-10-10］.. 10.1016/s0262-4079(17)32358-8
12	CHEN L， ZHANG H， XIAO J， et al. SCA-CNN： spatial and channel-wise attention in convolutional networks for image captioning［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2017： 5659-5667. 10.1109/cvpr.2017.667
13	WANG Q， WU B， ZHU P， et al. ECA-Net： efficient channel attention for deep convolutional neural networks ［C］// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539. 10.1109/cvpr42600.2020.01155
14	HUANG L， ZHAO X， HUANG K. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019，43（5）： 1562-1577.
15	WU Y， LIM J， YANG M H. Online object tracking： a benchmark［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE，2013： 2411-2418. 10.1109/cvpr.2013.312
16	WU Y， LIM J， YANG M H. Object tracking benchmark［J］. IEEE Transactions on Pattern Analysis Machine Intelligence，2015，37（9）：1834-1848. 10.1109/tpami.2014.2388226
17	KRISTAN M， LEONARDIS A， MATAS J， et al. The sixth visual object tracking VOT2018 challenge results［C］// Proceedings of the 2018 the European Conference on Computer Vision. Cham： Springer， 2018： 3-53.
18	FAN H， LIN L， YANG F， et al. LaSOT： a high-quality benchmark for large-scale single object tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5374-5383. 10.1109/cvpr.2019.00552
19	LUKEZIC A， VOJIR T， ZAJC L C， et al. Discriminative correlation filter with channel and spatial reliability［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6309-6318. 10.1109/cvpr.2017.515
20	NAM H， HAN B. Learning multi-domain convolutional neural networks for visual tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4293-4302. 10.1109/cvpr.2016.465
21	GUO Q， FENG W， ZHOU C， et al. Learning dynamic Siamese network for visual object tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE，2017： 1763-1771. 10.1109/iccv.2017.196
22	DANELLJAN M， BHAT G， KHAN F S， et al. ECO： Efficient convolution operators for tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6638-6646. 10.1109/cvpr.2017.733
23	GALOOGAHI H K， FAGG A， LUCEY S. Learning background-aware correlation filters for visual tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1135-1143. 10.1109/iccv.2017.129
24	DANELLJAN M， HAGER G， KHAN F S， et al. Learning spatially regularized correlation filters for visual tracking［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE，2015： 4310-4318. 10.1109/iccv.2015.490
25	BERTINETTO L， VALMADRE J， GOLODETZ S， et al. Staple： Complementary learners for real-time tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1401-1409. 10.1109/cvpr.2016.156
26	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2014， 37（3）： 583-596. 10.1109/tpami.2014.2345390
27	DANELLJAN M， HAGER G， KHAN F S， et al. Convolutional features for correlation filter based visual tracking［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE，2015： 58-66. 10.1109/iccvw.2015.84

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[9]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[10]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[11]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[12]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[13]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[14]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[15]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.

Object tracking algorithm with hierarchical features and hybrid attention

融合层次特征和混合注意力的目标跟踪算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 19

References 27

Related Articles 15

Recommended Articles

Metrics