基于孪生网络和Transformer的小目标跟踪算法SiamTrans

doi:10.11772/j.issn.1001-9081.2022111790

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3733-3739.DOI: 10.11772/j.issn.1001-9081.2022111790

基于孪生网络和Transformer的小目标跟踪算法SiamTrans

公海涛¹, 陈志华¹(), 盛斌², 祝冰艳¹

^1.华东理工大学信息科学与工程学院，上海 200237
^2.上海交通大学电子信息与电气工程学院，上海 200240

收稿日期:2022-12-06 修回日期:2023-02-23 接受日期:2023-02-27 发布日期:2023-03-13 出版日期:2023-12-10
通讯作者: 陈志华
作者简介:公海涛（1998—），男，山东临沂人，硕士研究生，主要研究方向：计算机视觉、深度学习
陈志华（1969—），男，江西上饶人，教授，博士，CCF杰出会员，主要研究方向：计算机视觉、机器学习；Email：czh@ecust.edu.cn
盛斌（1981—），男，湖北武汉人，教授，博士，CCF会员，主要研究方向：虚拟现实、计算机图形学
祝冰艳（1998—），女，安徽六安人，硕士研究生，主要研究方向：计算机视觉、深度学习。
基金资助:
空间智能控制技术全国重点实验室开放基金课题(HTKJ2022KL502010)

SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer

Haitao GONG¹, Zhihua CHEN¹(), Bin SHENG², Bingyan ZHU¹

^1.School of Information Science and Engineering，East China University of Science and Technology，Shanghai 200237，China
^2.School of Electronic Information and Electrical Engineering，Shanghai Jiao Tong University，Shanghai 200240，China

Received:2022-12-06 Revised:2023-02-23 Accepted:2023-02-27 Online:2023-03-13 Published:2023-12-10
Contact: Zhihua CHEN
About author:GONG Haitao， born in 1998， M. S. candidate. His research interests include computer vision， deep learning.
SHENG Bin， born in 1981， Ph. D.， professor. His research interests include virtual reality， computer graphics.
ZHU Bingyan， born in 1998， M. S. candidate. Her research interests include computer vision， deep learning.
Supported by:
Fund Project of National Key Laboratory of Space Intelligent Control(HTKJ2022KL502010)

摘要/Abstract

摘要：

针对现有小目标跟踪算法的鲁棒性差、精度及成功率低的问题，提出一种基于孪生网络和Transformer的小目标跟踪算法SiamTrans。首先，基于Transformer机制设计一种相似度响应图计算模块。该模块叠加若干层特征编码-解码结构，并利用多头自注意力机制和多头跨注意力机制在不同层次的搜索区域特征图中查询模板特征图信息，从而避免陷入局部最优解，并获得一个高质量的相似度响应图；其次，在预测子网中设计一个基于Transformer机制的预测模块（PM），并利用自注意力机制处理预测分支特征图中的冗余特征信息，以提高不同预测分支的预测精度。在Small90数据集上，相较于TransT（Transformer Tracking）算法，所提算法的跟踪精度和跟踪成功率分别高8.0和9.5个百分点。可见，所提出的算法具有更优异的小目标跟踪性能。

关键词: 目标跟踪, 小目标, 孪生网络, 注意力机制, Transformer

Abstract:

Aiming at the problems of poor robustness， low precision and success rate in the existing tiny object tracking algorithms， a tiny object tracking algorithm， SiamTrans， was proposed on the basis of Siamese network and Transformer. Firstly， a similarity response map calculation module was designed based on the Transformer mechanism. In the module， several layers of feature encoding-decoding structures were superimposed， and multi-head self-attention and multi-head cross-attention mechanisms were used to query template feature map information in feature maps of different levels of search regions， which avoided falling into local optimal solutions and obtained a high-quality similarity response map. Secondly， a Prediction Module （PM） based on Transformer mechanism was designed in the prediction subnetwork， and the self-attention mechanism was used to process redundant feature information in the prediction branch feature maps to improve the prediction precisions of different prediction branches. Experimental results on Small90 dataset show that， compared to the TransT （Transformer Tracking） algorithm， the tracking precision and tracking success rate of the proposed algorithm are 8.0 and 9.5 percentage points higher， respectively. It can be seen that the proposed algorithm has better tracking performance for tiny objects.

Key words: object tracking, tiny object, Siamese network, attention mechanism, Transformer

中图分类号:

TP391.4

公海涛, 陈志华, 盛斌, 祝冰艳. 基于孪生网络和Transformer的小目标跟踪算法SiamTrans[J]. 计算机应用, 2023, 43(12): 3733-3739.

Haitao GONG, Zhihua CHEN, Bin SHENG, Bingyan ZHU. SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer[J]. Journal of Computer Applications, 2023, 43(12): 3733-3739.

图/表 11

参考文献 33

1	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（3）： 583-596. 10.1109/tpami.2014.2345390
2	LI B， YAN J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
3	GUO D， WANG J， CUI Y， et al. SiamCAR： Siamese fully convolutional classification and regression for visual tracking ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6268-6276. 10.1109/cvpr42600.2020.00630
4	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
5	王梦亭，杨文忠，武雍智. 基于孪生网络的单目标跟踪算法综述［J］. 计算机应用， 2023， 43（3）：661-673.
	WANG M T， YANG W Z， WU Y Z. Survey of single target tracking algorithms based on Siamese network ［J］. Journal of Computer Applications， 2023， 43（3）： 661-673.
6	LIU C， DING W， YANG J， et al. Aggregation signature for small object tracking ［J］. IEEE Transactions on Image Processing， 2020， 29： 1738-1747. 10.1109/tip.2019.2940477
7	ZHU Y， LI C， LIU Y， et al. Tiny object tracking： a large-scale dataset and a baseline［EB/OL］. （2022-02-11）［2022-09-16］.. 10.1109/tnnls.2023.3239529
8	MUELLER M， SMITH N， GHANEM B. A benchmark and simulator for UAV tracking ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 445-461.
9	朱文球，邹广，曾志高. 融合层次特征和混合注意力的目标跟踪算法［J］. 计算机应用， 2022， 42（3）： 833-843.
	ZHU W Q， ZOU G， ZENG Z G. Object tracking algorithm with hierarchical features and hybrid attention［J］. Journal of Computer Applications， 2022， 42（3）： 833-843.
10	AHMADI K， SALARI E. Small dim object tracking using frequency and spatial domain information［J］. Pattern Recognition， 2016， 58： 227-234. 10.1016/j.patcog.2016.04.001
11	AHMADI K， SALARI E. Small dim object tracking using a multi objective particle swarm optimisation technique［J］. IET Image Processing， 2015， 9（9）： 820-826. 10.1049/iet-ipr.2014.0927
12	MARVASTI-ZADEH S M， KHAGHANI J， CHANEI-YAKHDAN H， et al. COMET： context-aware IoU-guided network for small object tracking ［C］// Proceedings of the 2020 Asian Conference on Computer Vision， LNCS 12623. Cham： Springer， 2021： 594-611.
13	HENRIQUES J F， CASEIRO R， MARTINS P， et al. Exploiting the circulant structure of tracking-by-detection with kernels［C］// Proceedings of the 2012 European Conference on Computer Vision， LNCS 7575. Berlin： Springer， 2012： 702-715.
14	LI Y， ZHU J. A scale adaptive kernel correlation filter tracker with feature integration［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8926. Cham： Springer， 2015： 254-265.
15	DANELLJAN M， BHAT G， SHAHBAZ KHAN F， et al. ECO： efficient convolution operators for tracking ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6931-6939. 10.1109/cvpr.2017.733
16	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 850-865.
17	LI B， WU W， WANG Q， et al. SiamRPN++： evolution of Siamese visual tracking with very deep networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4282-4291. 10.1109/cvpr.2019.00441
18	ZHU Z， WANG Q， LI B， et al. Distractor-aware Siamese networks for visual object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 103-119.
19	WANG Q， ZHANG L， BERTINETTO L， et al. Fast online object tracking and segmentation： a unifying approach ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1328-1338. 10.1109/cvpr.2019.00142
20	CHEN Z， ZHONG B， LI G， et al. Siamese box adaptive network for visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6667-6676. 10.1109/cvpr42600.2020.00670
21	YAN B， PENG H， FU J， et al. Learning spatio-temporal Transformer for visual tracking ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10428-10437. 10.1109/iccv48922.2021.01028
22	WANG N， ZHOU W， WANG J， et al. Transformer meets tracker： exploiting temporal context for robust visual tracking ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1571-1580. 10.1109/cvpr46437.2021.00162
23	BLATTER P， KANAKIS M， DANELLJAN M， et al. Efficient visual tracking with Exemplar Transformers ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 1571-1581. 10.1109/wacv56688.2023.00162
24	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
25	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
26	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
27	HUANG L， ZHAO X， HUANG K. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（5）： 1562-1577. 10.1109/tpami.2019.2957464
28	CHEN X， YAN B， ZHU J，et al. Transformer tracking［C］//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway：IEEE，2021：8122-8131. 10.1109/cvpr46437.2021.00803
29	CHOI J， CHANG H J， JEONG J， et al. Visual tracking using attention-modulated disintegration and integration［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4321-4330. 10.1109/cvpr.2016.468
	NAM H， HAN B. Learning multi-domain convolutional neural networks for visual tracking ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4293-4302. 10.1109/cvpr.2016.468
30	GRABNER H， GRABNER M， BISCHOF H. Real-time tracking via on-line boosting ［EB/OL］. ［2022-11-20］. . 10.5244/c.20.6
31	ZHANG J， MA S， SCLAROFF S. MEEM： robust tracking via multiple experts using entropy minimization ［C］// Proceedings of the 2014 European Conference on Computer Vision，LNCS 8694. Cham：Springer， 2014：188-203.
32	ZHANG Z， PENG H， FU J， et al. Ocean： object-aware anchor-free tracking ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12366. Cham： Springer， 2020： 771-787.
33	CHEN X， YAN B， ZHU J， et al. Transformer tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 8122-8131. 10.1109/cvpr46437.2021.00803

算法	遮挡		形变		运动模糊		快速运动		低分辨率
算法	精度	成功率	精度	成功率	精度	成功率	精度	成功率	精度	成功率
SCT^［29］	0.726	0.460	0.676	0.425	0.421	0.260	0.500	0.317	0.666	0.414
KCF_AST^［6］	0.772	0.469	0.805	0.416	0.582	0.368	0.645	0.421	0.783	0.475
MDNet_AST^［6］	0.803	0.507	0.794	0.519	0.717	0.464	0.809	0.537	0.805	0.527
ECO^［15］	0.757	0.480	0.777	0.508	0.696	0.453	0.770	0.514	0.900	0.587
SiamTrans	0.796	0.571	0.826	0.534	0.763	0.525	0.836	0.570	0.844	0.599

算法	遮挡		形变		运动模糊		快速运动		低分辨率
算法	精度	成功率	精度	成功率	精度	成功率	精度	成功率	精度	成功率
SCT^［29］	0.726	0.460	0.676	0.425	0.421	0.260	0.500	0.317	0.666	0.414
KCF_AST^［6］	0.772	0.469	0.805	0.416	0.582	0.368	0.645	0.421	0.783	0.475
MDNet_AST^［6］	0.803	0.507	0.794	0.519	0.717	0.464	0.809	0.537	0.805	0.527
ECO^［15］	0.757	0.480	0.777	0.508	0.696	0.453	0.770	0.514	0.900	0.587
SiamTrans	0.796	0.571	0.826	0.534	0.763	0.525	0.836	0.570	0.844	0.599

算法	Small112		UAV20L
算法	成功率	精度	成功率	精度
SAMF^［14］	—	—	0.380	0.457
KCF^［1］	0.416	0.580	0.202	0.321
KCF_AST^［6］	0.492	0.710	0.204	0.345
ECO^［15］	0.629	0.779	—	—
CSK^［13］	0.429	0.585	0.177	0.309
DaSiamRPN_AST^［6］	0.693	0.805	0.705	0.717
SiamTrans	0.687	0.809	0.710	0.721

算法	Small112		UAV20L
算法	成功率	精度	成功率	精度
SAMF^［14］	—	—	0.380	0.457
KCF^［1］	0.416	0.580	0.202	0.321
KCF_AST^［6］	0.492	0.710	0.204	0.345
ECO^［15］	0.629	0.779	—	—
CSK^［13］	0.429	0.585	0.177	0.309
DaSiamRPN_AST^［6］	0.693	0.805	0.705	0.717
SiamTrans	0.687	0.809	0.710	0.721

算法	总体	尺度变化	快速运动	目标消失	光照变化	相机运动	运动模糊
ECO^［15］	0.326 0	0.320 0	0.187 0	0.112 0	0.382 0	0.295 0	0.080 2
Ocean^［32］	0.343 0	0.357 0	0.209 0	0.125 0	0.395 0	0.272 0	0.093 6
SiamRPN++^［17］	0.359 0	0.368 0	0.196 0	0.116 0	0.401 0	0.288 0	0.082 8
SiamBAN^［20］	0.349 0	0.373 0	0.212 0	0.111 0	0.420 0	0.286 0	0.097 6
MKDNet^［7］	0.413 0	0.4410	0.264 0	0.1710	0.474 0	0.378 0	0.141 0
SiamTrans	0.4190	0.438 8	0.2703	0.170 5	0.4803	0.3806	0.1478

基于孪生网络和Transformer的小目标跟踪算法SiamTrans

SiamTrans： tiny object tracking algorithm based on Siamese network and Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 33

相关文章 15

编辑推荐

Metrics

对照组	Small90		UAV123_10fps
对照组	成功率	精度	成功率	精度
互相关操作	0.502	0.756	0.596	0.815
2层FEM-FDM	0.469	0.738	0.600	0.812
4层FEM-FDM	0.485	0.747	0.583	0.784
6层FEM-FDM	0.507	0.768	0.603	0.811
8层FEM-FDM	0.476	0.740	0.594	0.807

PM层数	Small90		UAV123_10fps
PM层数	成功率	精度	成功率	精度
0	0.487	0.745	0.605	0.802
2	0.487	0.749	0.611	0.814
4	0.492	0.745	0.621	0.819
6	0.505	0.772	0.621	0.822
8	0.025	0.042	0.017	0.033

[1]	王宏, 钱清, 王欢, 龙永. 融合大核注意力卷积的轻量化图像篡改定位算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2692-2699.
[2]	陈蒙蒙, 乔志伟. 基于融合通道注意力的Uformer的CT图像稀疏重建[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2948-2954.
[3]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[4]	袁国龙, 张玉金, 刘洋. 基于残差反馈和自注意力的图像篡改取证网络[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2925-2931.
[5]	张秋余, 温永旺. 用于语音检索的三联体深度哈希方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2910-2918.
[6]	崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405.
[7]	齐爱玲, 王宣淋. 基于中层细微特征提取与多尺度特征融合细粒度图像识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2556-2563.
[8]	金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.
[9]	段升位, 程欣宇, 王浩舟, 王飞. 基于改进的YOLOv5的大坝表面病害检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2619-2629.
[10]	刘源, 董永权, 贾瑞, 杨昊霖. 面向个性化课程推荐的分层分期注意力网络模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2358-2363.
[11]	姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329.
[12]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[13]	王静红, 周志霞, 王辉, 李昊康. 双路自编码器的属性网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2338-2344.
[14]	梁敏, 刘佳艺, 李杰. 融合迭代反馈与注意力机制的图像超分辨重建方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2280-2287.
[15]	周静, 胡怡宇, 胡成玉, 王天江. 基于点云补全和多分辨Transformer的弱感知目标检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2155-2165.