SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer

doi:10.11772/j.issn.1001-9081.2022111790

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (12): 3733-3739.DOI: 10.11772/j.issn.1001-9081.2022111790

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer

Haitao GONG¹, Zhihua CHEN¹(), Bin SHENG², Bingyan ZHU¹

^1.School of Information Science and Engineering，East China University of Science and Technology，Shanghai 200237，China
^2.School of Electronic Information and Electrical Engineering，Shanghai Jiao Tong University，Shanghai 200240，China

Received:2022-12-06 Revised:2023-02-23 Accepted:2023-02-27 Online:2023-03-13 Published:2023-12-10
Contact: Zhihua CHEN
About author:GONG Haitao， born in 1998， M. S. candidate. His research interests include computer vision， deep learning.
SHENG Bin， born in 1981， Ph. D.， professor. His research interests include virtual reality， computer graphics.
ZHU Bingyan， born in 1998， M. S. candidate. Her research interests include computer vision， deep learning.
Supported by:
Fund Project of National Key Laboratory of Space Intelligent Control(HTKJ2022KL502010)

基于孪生网络和Transformer的小目标跟踪算法SiamTrans

公海涛¹, 陈志华¹(), 盛斌², 祝冰艳¹

^1.华东理工大学信息科学与工程学院，上海 200237
^2.上海交通大学电子信息与电气工程学院，上海 200240

通讯作者: 陈志华
作者简介:公海涛（1998—），男，山东临沂人，硕士研究生，主要研究方向：计算机视觉、深度学习
陈志华（1969—），男，江西上饶人，教授，博士，CCF杰出会员，主要研究方向：计算机视觉、机器学习；Email：czh@ecust.edu.cn
盛斌（1981—），男，湖北武汉人，教授，博士，CCF会员，主要研究方向：虚拟现实、计算机图形学
祝冰艳（1998—），女，安徽六安人，硕士研究生，主要研究方向：计算机视觉、深度学习。
基金资助:
空间智能控制技术全国重点实验室开放基金课题(HTKJ2022KL502010)

Abstract

Abstract:

Aiming at the problems of poor robustness， low precision and success rate in the existing tiny object tracking algorithms， a tiny object tracking algorithm， SiamTrans， was proposed on the basis of Siamese network and Transformer. Firstly， a similarity response map calculation module was designed based on the Transformer mechanism. In the module， several layers of feature encoding-decoding structures were superimposed， and multi-head self-attention and multi-head cross-attention mechanisms were used to query template feature map information in feature maps of different levels of search regions， which avoided falling into local optimal solutions and obtained a high-quality similarity response map. Secondly， a Prediction Module （PM） based on Transformer mechanism was designed in the prediction subnetwork， and the self-attention mechanism was used to process redundant feature information in the prediction branch feature maps to improve the prediction precisions of different prediction branches. Experimental results on Small90 dataset show that， compared to the TransT （Transformer Tracking） algorithm， the tracking precision and tracking success rate of the proposed algorithm are 8.0 and 9.5 percentage points higher， respectively. It can be seen that the proposed algorithm has better tracking performance for tiny objects.

Key words: object tracking, tiny object, Siamese network, attention mechanism, Transformer

摘要：

针对现有小目标跟踪算法的鲁棒性差、精度及成功率低的问题，提出一种基于孪生网络和Transformer的小目标跟踪算法SiamTrans。首先，基于Transformer机制设计一种相似度响应图计算模块。该模块叠加若干层特征编码-解码结构，并利用多头自注意力机制和多头跨注意力机制在不同层次的搜索区域特征图中查询模板特征图信息，从而避免陷入局部最优解，并获得一个高质量的相似度响应图；其次，在预测子网中设计一个基于Transformer机制的预测模块（PM），并利用自注意力机制处理预测分支特征图中的冗余特征信息，以提高不同预测分支的预测精度。在Small90数据集上，相较于TransT（Transformer Tracking）算法，所提算法的跟踪精度和跟踪成功率分别高8.0和9.5个百分点。可见，所提出的算法具有更优异的小目标跟踪性能。

关键词: 目标跟踪, 小目标, 孪生网络, 注意力机制, Transformer

CLC Number:

TP391.4

Haitao GONG, Zhihua CHEN, Bin SHENG, Bingyan ZHU. SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer[J]. Journal of Computer Applications, 2023, 43(12): 3733-3739.

公海涛, 陈志华, 盛斌, 祝冰艳. 基于孪生网络和Transformer的小目标跟踪算法SiamTrans[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3733-3739.

Figures/Tables 11

References 33

1	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（3）： 583-596. 10.1109/tpami.2014.2345390
2	LI B， YAN J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
3	GUO D， WANG J， CUI Y， et al. SiamCAR： Siamese fully convolutional classification and regression for visual tracking ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6268-6276. 10.1109/cvpr42600.2020.00630
4	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
5	王梦亭，杨文忠，武雍智. 基于孪生网络的单目标跟踪算法综述［J］. 计算机应用， 2023， 43（3）：661-673.
	WANG M T， YANG W Z， WU Y Z. Survey of single target tracking algorithms based on Siamese network ［J］. Journal of Computer Applications， 2023， 43（3）： 661-673.
6	LIU C， DING W， YANG J， et al. Aggregation signature for small object tracking ［J］. IEEE Transactions on Image Processing， 2020， 29： 1738-1747. 10.1109/tip.2019.2940477
7	ZHU Y， LI C， LIU Y， et al. Tiny object tracking： a large-scale dataset and a baseline［EB/OL］. （2022-02-11）［2022-09-16］.. 10.1109/tnnls.2023.3239529
8	MUELLER M， SMITH N， GHANEM B. A benchmark and simulator for UAV tracking ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 445-461.
9	朱文球，邹广，曾志高. 融合层次特征和混合注意力的目标跟踪算法［J］. 计算机应用， 2022， 42（3）： 833-843.
	ZHU W Q， ZOU G， ZENG Z G. Object tracking algorithm with hierarchical features and hybrid attention［J］. Journal of Computer Applications， 2022， 42（3）： 833-843.
10	AHMADI K， SALARI E. Small dim object tracking using frequency and spatial domain information［J］. Pattern Recognition， 2016， 58： 227-234. 10.1016/j.patcog.2016.04.001
11	AHMADI K， SALARI E. Small dim object tracking using a multi objective particle swarm optimisation technique［J］. IET Image Processing， 2015， 9（9）： 820-826. 10.1049/iet-ipr.2014.0927
12	MARVASTI-ZADEH S M， KHAGHANI J， CHANEI-YAKHDAN H， et al. COMET： context-aware IoU-guided network for small object tracking ［C］// Proceedings of the 2020 Asian Conference on Computer Vision， LNCS 12623. Cham： Springer， 2021： 594-611.
13	HENRIQUES J F， CASEIRO R， MARTINS P， et al. Exploiting the circulant structure of tracking-by-detection with kernels［C］// Proceedings of the 2012 European Conference on Computer Vision， LNCS 7575. Berlin： Springer， 2012： 702-715.
14	LI Y， ZHU J. A scale adaptive kernel correlation filter tracker with feature integration［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8926. Cham： Springer， 2015： 254-265.
15	DANELLJAN M， BHAT G， SHAHBAZ KHAN F， et al. ECO： efficient convolution operators for tracking ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6931-6939. 10.1109/cvpr.2017.733
16	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 850-865.
17	LI B， WU W， WANG Q， et al. SiamRPN++： evolution of Siamese visual tracking with very deep networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4282-4291. 10.1109/cvpr.2019.00441
18	ZHU Z， WANG Q， LI B， et al. Distractor-aware Siamese networks for visual object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 103-119.
19	WANG Q， ZHANG L， BERTINETTO L， et al. Fast online object tracking and segmentation： a unifying approach ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1328-1338. 10.1109/cvpr.2019.00142
20	CHEN Z， ZHONG B， LI G， et al. Siamese box adaptive network for visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6667-6676. 10.1109/cvpr42600.2020.00670
21	YAN B， PENG H， FU J， et al. Learning spatio-temporal Transformer for visual tracking ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10428-10437. 10.1109/iccv48922.2021.01028
22	WANG N， ZHOU W， WANG J， et al. Transformer meets tracker： exploiting temporal context for robust visual tracking ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1571-1580. 10.1109/cvpr46437.2021.00162
23	BLATTER P， KANAKIS M， DANELLJAN M， et al. Efficient visual tracking with Exemplar Transformers ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 1571-1581. 10.1109/wacv56688.2023.00162
24	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
25	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
26	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
27	HUANG L， ZHAO X， HUANG K. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（5）： 1562-1577. 10.1109/tpami.2019.2957464
28	CHEN X， YAN B， ZHU J，et al. Transformer tracking［C］//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway：IEEE，2021：8122-8131. 10.1109/cvpr46437.2021.00803
29	CHOI J， CHANG H J， JEONG J， et al. Visual tracking using attention-modulated disintegration and integration［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4321-4330. 10.1109/cvpr.2016.468
	NAM H， HAN B. Learning multi-domain convolutional neural networks for visual tracking ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4293-4302. 10.1109/cvpr.2016.468
30	GRABNER H， GRABNER M， BISCHOF H. Real-time tracking via on-line boosting ［EB/OL］. ［2022-11-20］. . 10.5244/c.20.6
31	ZHANG J， MA S， SCLAROFF S. MEEM： robust tracking via multiple experts using entropy minimization ［C］// Proceedings of the 2014 European Conference on Computer Vision，LNCS 8694. Cham：Springer， 2014：188-203.
32	ZHANG Z， PENG H， FU J， et al. Ocean： object-aware anchor-free tracking ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12366. Cham： Springer， 2020： 771-787.
33	CHEN X， YAN B， ZHU J， et al. Transformer tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 8122-8131. 10.1109/cvpr46437.2021.00803

算法	遮挡		形变		运动模糊		快速运动		低分辨率
算法	精度	成功率	精度	成功率	精度	成功率	精度	成功率	精度	成功率
SCT^［29］	0.726	0.460	0.676	0.425	0.421	0.260	0.500	0.317	0.666	0.414
KCF_AST^［6］	0.772	0.469	0.805	0.416	0.582	0.368	0.645	0.421	0.783	0.475
MDNet_AST^［6］	0.803	0.507	0.794	0.519	0.717	0.464	0.809	0.537	0.805	0.527
ECO^［15］	0.757	0.480	0.777	0.508	0.696	0.453	0.770	0.514	0.900	0.587
SiamTrans	0.796	0.571	0.826	0.534	0.763	0.525	0.836	0.570	0.844	0.599

算法	遮挡		形变		运动模糊		快速运动		低分辨率
算法	精度	成功率	精度	成功率	精度	成功率	精度	成功率	精度	成功率
SCT^［29］	0.726	0.460	0.676	0.425	0.421	0.260	0.500	0.317	0.666	0.414
KCF_AST^［6］	0.772	0.469	0.805	0.416	0.582	0.368	0.645	0.421	0.783	0.475
MDNet_AST^［6］	0.803	0.507	0.794	0.519	0.717	0.464	0.809	0.537	0.805	0.527
ECO^［15］	0.757	0.480	0.777	0.508	0.696	0.453	0.770	0.514	0.900	0.587
SiamTrans	0.796	0.571	0.826	0.534	0.763	0.525	0.836	0.570	0.844	0.599

算法	Small112		UAV20L
算法	成功率	精度	成功率	精度
SAMF^［14］	—	—	0.380	0.457
KCF^［1］	0.416	0.580	0.202	0.321
KCF_AST^［6］	0.492	0.710	0.204	0.345
ECO^［15］	0.629	0.779	—	—
CSK^［13］	0.429	0.585	0.177	0.309
DaSiamRPN_AST^［6］	0.693	0.805	0.705	0.717
SiamTrans	0.687	0.809	0.710	0.721

算法	Small112		UAV20L
算法	成功率	精度	成功率	精度
SAMF^［14］	—	—	0.380	0.457
KCF^［1］	0.416	0.580	0.202	0.321
KCF_AST^［6］	0.492	0.710	0.204	0.345
ECO^［15］	0.629	0.779	—	—
CSK^［13］	0.429	0.585	0.177	0.309
DaSiamRPN_AST^［6］	0.693	0.805	0.705	0.717
SiamTrans	0.687	0.809	0.710	0.721

算法	总体	尺度变化	快速运动	目标消失	光照变化	相机运动	运动模糊
ECO^［15］	0.326 0	0.320 0	0.187 0	0.112 0	0.382 0	0.295 0	0.080 2
Ocean^［32］	0.343 0	0.357 0	0.209 0	0.125 0	0.395 0	0.272 0	0.093 6
SiamRPN++^［17］	0.359 0	0.368 0	0.196 0	0.116 0	0.401 0	0.288 0	0.082 8
SiamBAN^［20］	0.349 0	0.373 0	0.212 0	0.111 0	0.420 0	0.286 0	0.097 6
MKDNet^［7］	0.413 0	0.4410	0.264 0	0.1710	0.474 0	0.378 0	0.141 0
SiamTrans	0.4190	0.438 8	0.2703	0.170 5	0.4803	0.3806	0.1478

SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer

基于孪生网络和Transformer的小目标跟踪算法SiamTrans

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 33

Related Articles 15

Recommended Articles

Metrics

对照组	Small90		UAV123_10fps
对照组	成功率	精度	成功率	精度
互相关操作	0.502	0.756	0.596	0.815
2层FEM-FDM	0.469	0.738	0.600	0.812
4层FEM-FDM	0.485	0.747	0.583	0.784
6层FEM-FDM	0.507	0.768	0.603	0.811
8层FEM-FDM	0.476	0.740	0.594	0.807

PM层数	Small90		UAV123_10fps
PM层数	成功率	精度	成功率	精度
0	0.487	0.745	0.605	0.802
2	0.487	0.749	0.611	0.814
4	0.492	0.745	0.621	0.819
6	0.505	0.772	0.621	0.822
8	0.025	0.042	0.017	0.033

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769.
[5]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[6]	Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902.
[7]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[8]	Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951.
[9]	Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746.
[10]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[11]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[12]	Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579.
[13]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[14]	Kaili DENG, Weibo WEI, Zhenkuan PAN. Industrial defect detection method with improved masked autoencoder [J]. Journal of Computer Applications, 2024, 44(8): 2595-2603.
[15]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.