Survey of single target tracking algorithms based on Siamese network

doi:10.11772/j.issn.1001-9081.2022010150

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 661-673.DOI: 10.11772/j.issn.1001-9081.2022010150

Special Issue: 人工智能；综述

• Artificial intelligence • Next Articles

Survey of single target tracking algorithms based on Siamese network

Mengting WANG, Wenzhong YANG(), Yongzhi WU

School of Information Science and Engineering，Xinjiang University，Urumqi Xinjiang 830046，China

Received:2022-02-11 Revised:2022-04-28 Accepted:2022-05-05 Online:2022-05-24 Published:2023-03-10
Contact: Wenzhong YANG
About author:WANG Mengting， born in 1995， M. S. candidate. Her research interests include single object tracking， computer vision.
YANG Wenzhong， born in 1971， Ph. D.， associate professor. His research interests include image processing.
WU Yongzhi， born in 1995， M. S. candidate. His research interests include person re-identification， computer vision.
Supported by:
Major Project of Science and Technology of Xinjiang Uygur Autonomous Region(2020A02001-1);Science and Technology Program of Xinjiang Uygur Autonomous Region(202104120007);Natural Science Foundation of Jiangxi Province(20202BAB202023)

基于孪生网络的单目标跟踪算法综述

王梦亭, 杨文忠(), 武雍智

新疆大学信息科学与工程学院，乌鲁木齐 830046

通讯作者: 杨文忠
作者简介:王梦亭（1995—），女，河南周口人，硕士研究生，主要研究方向：单目标跟踪、计算机视觉
杨文忠（1971—），男，河南南阳人，副教授，博士，CCF会员，主要研究方向：图像处理
武雍智（1995—），男，甘肃张掖人，硕士研究生，主要研究方向：行人重识别、计算机视觉。
基金资助:
新疆维吾尔自治区科技重大专项(2020A02001-1);新疆维吾尔自治区科技计划项目(202104120007);江西省自然科学基金资助项目(20202BAB202023)

Abstract

Abstract:

Single object tracking is an important research direction in the field of computer vision， and has a wide range of applications in video surveillance， autonomous driving and other fields. For single object tracking algorithms， although a large number of summaries have been conducted， most of them are based on correlation filter or deep learning. In recent years， Siamese network-based tracking algorithms have received extensive attention from researchers for their balance between accuracy and speed， but there are relatively few summaries of this type of algorithms and it lacks systematic analysis of the algorithms at the architectural level. In order to deeply understand the single object tracking algorithms based on Siamese network， a large number of related literatures were organized and analyzed. Firstly， the structures and applications of the Siamese network were expounded， and each tracking algorithm was introduced according to the composition classification of the Siamese tracking algorithm architectures. Then， the commonly used datasets and evaluation metrics in the field of single object tracking were listed， the overall and each attribute performance of 25 mainstream tracking algorithms was compared and analyzed on OTB 2015 （Object Tracking Benchmark） dataset， and the performance and the reasoning speed of 23 Siamese network-based tracking algorithms on LaSOT （Large-scale Single Object Tracking） and GOT-10K （Generic Object Tracking） test sets were listed. Finally， the research on Siamese network-based tracking algorithms was summarized， and the possible future research directions of this type of algorithms were prospected.

Key words: Siamese network, single target tracking, computer vision, cross-correlation, anchor-free

摘要：

单目标跟踪是计算机视觉领域的一个重要研究方向，在视频监控、自动驾驶等领域应用广泛。对于单目标跟踪算法，尽管已有大量总结研究，但大多基于相关滤波或深度学习。近年来，基于孪生网络的跟踪算法因在精度和速度之间取得的平衡受到研究者们的广泛关注，然而目前对该类型算法的总结分析相对较少，并且对这些算法的架构层面缺少系统分析。为深入了解基于孪生网络的单目标跟踪算法，对大量相关文献进行了总结与分析。首先阐述孪生网络的结构和应用，并根据孪生跟踪算法架构组成的分类介绍了各跟踪算法；然后列举单目标跟踪领域常用的数据集和评价指标，对25个主流跟踪算法在OTB2015数据集上分别进行整体和各属性的性能比较与分析，并列出23个孪生跟踪算法在LaSOT和GOT-10K测试集上的性能以及推理时的速度；最后对基于孪生网络的目标跟踪算法的研究进行总结，并对未来的发展方向进行展望。

关键词: 孪生网络, 单目标跟踪, 计算机视觉, 互相关, 无锚框

CLC Number:

TP181

Mengting WANG, Wenzhong YANG, Yongzhi WU. Survey of single target tracking algorithms based on Siamese network[J]. Journal of Computer Applications, 2023, 43(3): 661-673.

王梦亭, 杨文忠, 武雍智. 基于孪生网络的单目标跟踪算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 661-673.

Figures/Tables 12

References 102

1	EMAMI A， DADGOSTAR F， BIGDELI A， et al. Role of spatiotemporal oriented energy features for robust visual tracking in video surveillance［C］// Proceedings of the IEEE 9th International Conference on Advanced Video and Signal-Based Surveillance. Piscataway： IEEE， 2012： 349-354. 10.1109/avss.2012.64
2	XING J L， AI H Z， LAO S H. Multiple human tracking based on multi-view upper-body detection and discriminative learning［C］// Proceedings of the 20th International Conference on Pattern Recognition. Piscataway： IEEE， 2010： 1698-1701. 10.1109/icpr.2010.420
3	XU R Y， GUAN Y P， HUANG Y Z. Multiple human detection and tracking based on head detection for real-time video surveillance［J］. Multimedia Tools and Applications， 2015， 74（3）： 729-742. 10.1007/s11042-014-2177-x
4	LEE K H， HWANG J N. On-road pedestrian tracking across multiple driving recorders［J］. IEEE Transactions on Multimedia， 2015， 17（9）： 1429-1438. 10.1109/tmm.2015.2455418
5	GAO M， JIN L S， JIANG Y Y， et al. Manifold Siamese network： a novel visual tracking convnet for autonomous vehicles［J］. IEEE Transactions on Intelligent Transportation Systems， 2020， 21（4）： 1612-1623. 10.1109/tits.2019.2930337
6	LIU L W， XING J L， AI H Z， et al. Hand posture recognition using finger geometric feature［C］// Proceedings of the 21st International Conference on Pattern Recognition. Piscataway： IEEE， 2012： 565-568.
7	ROBIN C， LACROIX S. Multi-robot target detection and tracking： taxonomy and survey［J］. Autonomous Robots， 2016， 40（4）： 729-760. 10.1007/s10514-015-9491-7
8	MANAFIFARD M， EBADI H， ABRISHAMI MOGHADDAM H. A survey on player tracking in soccer videos［J］. Computer Vision and Image Understanding， 2017， 159： 19-46. 10.1016/j.cviu.2017.02.002
9	LUO J H， HAN Y， FAN L Y. Underwater acoustic target tracking： a review［J］. Sensors， 2018， 18（1）： No.112. 10.3390/s18010112
10	孟晓燕，段建民. 基于相关滤波的目标跟踪算法研究综述［J］. 北京工业大学学报， 2020， 46（12）： 1393-1416. 10.11936/bjutxb2019030011
	MENG X Y， DUAN J M. Advances in correlation filter-based object tracking algorithms： a review［J］. Journal of Beijing University of Technology， 2020， 46（12）： 1393-1416. 10.11936/bjutxb2019030011
11	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（3）： 583-596. 10.1109/tpami.2014.2345390
12	DANELLJAN M， HÄGER G， SHAHBAZ F S， et al. Accurate scale estimation for robust visual tracking［C］// Proceedings of the 2014 British Machine Vision Conference. Durham： BMVA Press， 2014： No.65. 10.5244/c.28.65
13	DANELLJAN M， HÄGER G， KHAN F S， et al. Convolutional features for correlation filter based visual tracking［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE， 2015： 621-629. 10.1109/iccvw.2015.84
14	DANELLJAN M， BHAT G， KHAN F S， et al. ECO： efficient convolution operators for tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6931-6939. 10.1109/cvpr.2017.733
15	NAM H， HAN B. Learning multi-domain convolutional neural networks for visual tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 4293-4302. 10.1109/cvpr.2016.465
16	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 850-865.
17	LI B， YAN J J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
18	BROMLEY J， BENTZ J W， BOTTOU L， et al. Signature verification using a “Siamese” time delay neural network［J］. International Journal of Pattern Recognition and Artificial Intelligence， 1993， 7（4）： 669-688.
19	CHOPRA S， HADSELL R， LeCUN Y. Learning a similarity metric discriminatively， with application to face verification［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition， Volume I. Piscataway： IEEE， 2005： 539-546.
20	TAIGMAN Y， YANG M， RANZATO M， et al. DeepFace： closing the gap to human-level performance in face verification［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1701-1708. 10.1109/cvpr.2014.220
21	LIN T Y， CUI Y， BELONGIE S， et al. Learning deep representations for ground-to-aerial geolocalization［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 5007-5015. 10.1109/cvpr.2015.7299135
22	HAN X F， LEUNG T， JIA Y Q， et al. MatchNet： unifying feature and metric learning for patch-based matching［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3279-3286. 10.1109/cvpr.2015.7298948
23	ZAGORUYKO S， KOMODAKIS N. Learning to compare image patches via convolutional neural networks［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 4353-4361. 10.1109/cvpr.2015.7299064
24	ŽBONTAR J， LeCUN Y. Computing the stereo matching cost with a convolutional neural network［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1592-1599. 10.1109/cvpr.2015.7298767
25	KOCH G， ZEMEL R， SALAKHUTDINOV R. Siamese neural networks for one-shot image recognition［EB/OL］. ［2022-04-18］..
26	TAO R， GAVVES E， SMEULDERS A W M. Siamese instance search for tracking［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1420-1429. 10.1109/cvpr.2016.158
27	VALMADRE J， BERTINETTO L， HENRIQUES J， et al. End-to-end representation learning for correlation filter based tracking［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5000-5008. 10.1109/cvpr.2017.531
28	HE A F， LUO C， TIAN X M， et al. A twofold Siamese network for real-time object tracking［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4834-4843. 10.1109/cvpr.2018.00508
29	WANG Q， TENG Z， XING J L， et al. Learning attentions： residual attentional Siamese network for high performance online visual tracking［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4854-4863. 10.1109/cvpr.2018.00510
30	ZHANG Y H， WANG L J， QI J Q， et al. Structured Siamese network for real-time visual tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 355-370.
31	WANG G T， LUO C， XIONG Z W， et al. SPM-Tracker： series-parallel matching for real-time visual object tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3638-3647. 10.1109/cvpr.2019.00376
32	FAN H， LING H B. Siamese cascaded region proposal networks for real-time visual tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7944-7953. 10.1109/cvpr.2019.00814
33	SUNG F， YANG Y X， ZHANG L， et al. Learning to compare： relation network for few-shot learning［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1199-1208. 10.1109/cvpr.2018.00131
34	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
35	LI B， WU W， WANG Q， et al. SiamRPN++： evolution of Siamese visual tracking with very deep networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4277-4286. 10.1109/cvpr.2019.00441
36	ZHANG Z P， PENG H W. Deeper and wider Siamese networks for real-time visual tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4586-4595. 10.1109/cvpr.2019.00472
37	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
38	XIE S N， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995. 10.1109/cvpr.2017.634
39	SZEGEDY C， LIU W， JIA Y Q， et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
40	GUPTA D K， ARYA D， GAVVES E. Rotation equivariant Siamese networks for tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 12357-12366. 10.1109/cvpr46437.2021.01218
41	SOSNOVIK I， MOSKALEV A， SMEULDERS A. Scale equivariance improves Siamese tracking［C］// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2021： 2764-2773. 10.1109/wacv48630.2021.00281
42	HUANG C， LUCEY S， RAMANAN D. Learning policies for adaptive tracking with deep feature cascades［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 105-114. 10.1109/iccv.2017.21
43	YAN B， PENG H W， WU K， et al. LightTrack： finding lightweight neural networks for object tracking via one-shot architecture search［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15175-15184. 10.1109/cvpr46437.2021.01493
44	WANG Z Q， XU J， LIU L， et al. RANet： ranking attention network for fast video object segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 3977-3986. 10.1109/iccv.2019.00408
45	YAN B， ZHANG X Y， WANG D， et al. Alpha-Refine： boosting tracking performance by precise bounding box estimation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5285-5294. 10.1109/cvpr46437.2021.00525
46	LIAO B Y， WANG C Y， WANG Y Y， et al. PG-Net： pixel to global matching network for visual tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12367. Cham： Springer， 2020： 429-444.
47	ZHANG Z P， LIU Y H， WANG X， et al. Learn to match： automatic matching network design for visual tracking［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 13319-13328. 10.1109/iccv48922.2021.01309
48	GUO D Y， SHAO Y Y， CUI Y， et al. Graph attention tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9538-9547. 10.1109/cvpr46437.2021.00942
49	ZHOU Z K， PEI W J， LI X， et al. Saliency-associated object tracking［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9846-9855. 10.1109/iccv48922.2021.00972
50	HAN W C， DONG X P， KHAN F S， et al. Learning to fuse asymmetric feature maps in Siamese trackers［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 16565-16575. 10.1109/cvpr46437.2021.01630
51	CHEN X， YAN B， ZHU J W， et al. Transformer tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 8122-8131. 10.1109/cvpr46437.2021.00803
52	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
53	XU Y D， WANG Z Y， LI Z X， et al. SiamFC++： towards robust and accurate visual tracking with target estimation guidelines［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020：12549-12556. 10.1609/aaai.v34i07.6944
54	GUO D Y， WANG J， CUI Y， et al. SiamCAR： Siamese fully convolutional classification and regression for visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6268-6276. 10.1109/cvpr42600.2020.00630
55	CHEN Z D， ZHONG B N， LI G R， et al. Siamese box adaptive network for visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6667-6676. 10.1109/cvpr42600.2020.00670
56	ZHANG Z P， PENG H W， FU J L， et al. Ocean： object-aware anchor-free tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12366. Cham： Springer， 2020： 771-787.
57	DU F， LIU P， ZHAO W， et al. Correlation-guided attention for corner detection based visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6835-6844. 10.1109/cvpr42600.2020.00687
58	YANG Z， LIU S H， HU H， et al. RepPoints： point set representation for object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9656-9665. 10.1109/iccv.2019.00975
59	MA Z A， WANG L Y， ZHANG H T， et al. RPT： learning point set representation for Siamese visual tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12539. Cham： Springer， 2020： 653-665.
60	WANG Q， ZHANG L， BERTINETTO L， et al. Fast online object tracking and segmentation： a unifying approach［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1328-1338. 10.1109/cvpr.2019.00142
61	XU N， YANG L J， FAN Y C， et al. YouTube-VOS： sequence-to-sequence video object segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11209. Cham： Springer， 2018： 603-619.
62	BHAT G， DANELLJAN M， van GOOL L， et al. Learning discriminative model prediction for tracking［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6181-6190. 10.1109/iccv.2019.00628
63	HELD D， THRUN S， SAVARESE S. Learning to track at 100 FPS with deep regression networks［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 749-765.
64	YANG T Y， CHAN A B. Recurrent filter learning for visual tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops. Piscataway： IEEE， 2017： 2010-2019. 10.1109/iccvw.2017.235
65	YANG T Y， CHAN A B. Learning dynamic memory networks for object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 153-169.
66	FU Z H， LIU Q J， FU Z H， et al. STMTrack： template-free visual tracking with space-time memory networks［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13769-13778. 10.1109/cvpr46437.2021.01356
67	GUO Q， FENG W， ZHOU C， et al. Learning dynamic Siamese network for visual object tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1781-1789. 10.1109/iccv.2017.196
68	ZHU Z， WANG Q， LI B， et al. Distractor-aware Siamese networks for visual object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 103-119.
69	ZHANG L C， GONZALEZ-GARCIA A， J van de WEIJER， et al. Learning the model update for Siamese trackers［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 4009-4018. 10.1109/iccv.2019.00411
70	LI P X， CHEN B Y， OUYANG W L， et al. GradNet： gradient-guided network for visual object tracking［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6161-6170. 10.1109/iccv.2019.00626
71	CHOI J， KWON J， LEE K M. Deep meta learning for real-time target-aware visual tracking［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 911-920. 10.1109/iccv.2019.00100
72	GAO J Y， ZHANG T Z， XU C S. Graph convolutional tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4644-4654. 10.1109/cvpr.2019.00478
73	WANG N， ZHOU W G， WANG J， et al. Transformer meets tracker： exploiting temporal context for robust visual tracking［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 1571-1580. 10.1109/cvpr46437.2021.00162
74	YAN B， PENG H W， FU J L， et al. Learning spatio-temporal transformer for visual tracking［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10428-10437. 10.1109/iccv48922.2021.01028
75	SONG Y B， MA C， GONG L J， et al. CREST： convolutional residual learning for visual tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2574-2583. 10.1109/iccv.2017.279
76	YAO Y J， WU X H， ZHANG L， et al. Joint representation and truncated inference learning for correlation filter based tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11213. Cham： Springer， 2018： 560-575.
77	ZHU Z， WU W， ZOU W， et al. End-to-end flow correlation tracking with spatial-temporal attention［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 548-557. 10.1109/cvpr.2018.00064
78	DANELLJAN M， BHAT G， KHAN F S， et al. ATOM： accurate tracking by overlap maximization［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4655-4664. 10.1109/cvpr.2019.00479
79	DANELLJAN M， van GOOL L， TIMOFTE R. Probabilistic regression for visual tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 7181-7190. 10.1109/cvpr42600.2020.00721
80	YANG T Y， XU P F， HU R B， et al. ROAM： recurrently optimizing tracking model［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6717-6726. 10.1109/cvpr42600.2020.00675
81	VOIGTLAENDER P， LUITEN J， TORR P H S， et al. Siam R-CNN： visual tracking by re-detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6577-6587. 10.1109/cvpr42600.2020.00661
82	WANG X， LI C L， LUO B， et al. SINT++： robust visual tracking via adversarial positive instance generation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4864-4873. 10.1109/cvpr.2018.00511
83	WANG N， SONG Y B， MA C， et al. Unsupervised deep tracking［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1308-1317. 10.1109/cvpr.2019.00140
84	DONG X P， SHEN J B. Triplet loss in Siamese network for object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11217. Cham： Springer， 2018： 472-488.
85	WU Y， LIM J， YANG M H. Online object tracking： a benchmark［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 2411-2418. 10.1109/cvpr.2013.312
86	WU Y， LIM J， YANG M H. Object tracking benchmark［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1834-1848. 10.1109/tpami.2014.2388226
87	LIANG P P， BLASCH E， LING H B. Encoding color information for visual tracking： algorithms and benchmark［J］. IEEE Transactions on Image Processing， 2015， 24（12）： 5630-5644. 10.1109/tip.2015.2482905
88	GALOOGAHI H K， FAGG A， HUANG C， et al. Need for speed： a benchmark for higher frame rate object tracking［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1134-1143. 10.1109/iccv.2017.128
89	LI S Y， YEUNG D Y. Visual object tracking for unmanned aerial vehicles： a benchmark and new motion models［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 4140-4146. 10.1609/aaai.v31i1.11205
90	KRISTAN M， LEONARDIS A， MATAS J， et al. The sixth Visual Object Tracking VOT2018 challenge results［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11129. Cham： Springer， 2019： 3-53.
91	KRISTAN M， LEONARDIS A， MATAS J， et al. The eighth Visual Object Tracking VOT2020 challenge results［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12539. Cham： Springer， 2020： 547-601.
92	MÜLLER M， BIBI A， GIANCOLA S， et al. TrackingNet： a large-scale dataset and benchmark for object tracking in the wild［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11205. Cham： Springer， 2018： 310-327.
93	HUANG L H， ZHAO X， HUANG K Q. GOT-10k： a large high-diversity benchmark for generic object tracking in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（5）： 1562-1577. 10.1109/tpami.2019.2957464
94	MILLER G A. WordNet： a lexical database for English［J］. Communications of the ACM， 1995， 38（11）：39-41. 10.1145/219717.219748
95	FAN H， LIN L T， YANG F， et al. LaSOT： a high-quality large-scale single object tracking benchmark［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5369-5378. 10.1109/cvpr.2019.00552
96	WANG X， SHU X J， ZHANG Z P， et al. Towards more flexible and accurate object tracking with natural language： algorithms and benchmark［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13758-13768. 10.1109/cvpr46437.2021.01355
97	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
98	REAL E， SHLENS J， MAZZOCCHI S， et al. YouTube-BoundingBoxes： a large high-precision human-annotated data set for object detection in video［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 7464-7473. 10.1109/cvpr.2017.789
99	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 7740-755.
100	LI A N， LIN M， WU Y， et al. NUS-PRO： a new visual tracking challenge［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 38（2）： 335-349. 10.1109/tpami.2015.2417577
101	SMEULDERS A W M， CHU D M， CUCCHIARA R， et al. Visual tracking： an experimental survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2014， 36（7）： 1442-1468. 10.1109/tpami.2013.230
102	LI C L， LIANG X Y， LU Y J， et al. RGB-T object tracking： benchmark and baseline［J］. Pattern Recognition， 2019， 96： No.106977. 10.1016/j.patcog.2019.106977

类型	名称	利用特征	AUC	帧率/ （frame/·s^-1）
类型	名称	利用特征	AUC	CPU	GPU
传统的相关滤波算法	KCF	原始像素、HOG	0.477	172	—
传统的相关滤波算法	FDSST	HOG	0.551	54.3	—
结合深度特征的相关滤波算法	DeepSRDCF	HOG、深层外观特征	0.635	—	0.2
结合深度特征的相关滤波算法	ECO	HOG、CN、深层外观特征	0.691	—	8
其他深度算法	MDNet	深层外观特征	0.678	—	1
基于孪生网络的算法	SiamFC	深层外观特征	0.582	—	86
基于孪生网络的算法	SiamRPN	深层外观特征	0.637	—	160

类型	名称	利用特征	AUC	帧率/ （frame/·s^-1）
类型	名称	利用特征	AUC	CPU	GPU
传统的相关滤波算法	KCF	原始像素、HOG	0.477	172	—
传统的相关滤波算法	FDSST	HOG	0.551	54.3	—
结合深度特征的相关滤波算法	DeepSRDCF	HOG、深层外观特征	0.635	—	0.2
结合深度特征的相关滤波算法	ECO	HOG、CN、深层外观特征	0.691	—	8
其他深度算法	MDNet	深层外观特征	0.678	—	1
基于孪生网络的算法	SiamFC	深层外观特征	0.582	—	86
基于孪生网络的算法	SiamRPN	深层外观特征	0.637	—	160

数据集		训练视频数	测试视频数	帧数				帧率/ （frame·s^-1）	重叠数据集	属性名称
数据集		训练视频数	测试视频数	总数/10⁶	最小	平均	最大	帧率/ （frame·s^-1）	重叠数据集	属性名称
早期数据集	OTB2013	—	51	0.029	71	578	3 872	30	VOT、OTB2015、TColor-128	IV、SV、OCC、DEF、MB、 FM、IPR、OPR、OV、BC、LR
	OTB2015	—	100	0.059	71	590	3 872	30	VOT、OTB2013、TColor-128
	TColor-128	—	128	0.055	71	429	3 872	30	OTB、VOT
	UAV123	—	123	0.113	109	915	3 085	30	VOT、UAV20L	ARC、BC、CM、FM、FOC、IV、 LR、OV、POC、SOB、SV、VC
	UAV20L	—	20	0.059	1 717	2 934	5 527	30	VOT、UAV123	ARC、BC、CM、FM、FOC、IV、 LR、OV、POC、SOB、SV、VC
	NfS	—	100	0.383	169	3 830	20 665	240	YouTube	IV、SV、OCC、DEF、FM、OV、 BC、LR、VC
	VOT2018	—	60	0.021 36	41	356	1 500	30	NUS-PRO^［100］、ALOV++^［101］、 OTB、TColor-128、UAV123	OCO、SCO、ARC、CM、MOC、 DEF、AM、MB、BC、IV、SV、OCC
	VOT2020	—	60	—	—	—	—	30	OTB、VOT、ALOV++，UAV123、 NUS-PRO、TColor-128、RGBT234^［102］	OCC、IV、MOC、SZ、CM
大规模数据集	TrackingNet	30 132	511	14.43	—	480	—	30	YouTube-BB	IV、SV、DEF、MB、FM、IPR、 OPR、OV、BC、LR、ARC、CM、 FOC、POC、SOB
	GOT-10K	9 335	180	1.5	29	149	1 418	10	VOT、WordNet、ImageNet	IV、SV、OCC、FM、ARC、LR
	LaSOT	1 120	280	3.87	1 000	2 502	11 397	30	YouTube、ImageNet	IV、SV、DEF、MB、FM、OV、 BC、LR、ARC、CM、FOC、POC、 VC、ROT
	TNL2K	1 300	700	1.24	21	622	18 488	30	YouTube	CM、ROT、DEF、FOC、IV、OV、 POC、VC、SV、BC、MB、ARC、 LR、FM、AS、TC、MS

数据集		训练视频数	测试视频数	帧数				帧率/ （frame·s^-1）	重叠数据集	属性名称
数据集		训练视频数	测试视频数	总数/10⁶	最小	平均	最大	帧率/ （frame·s^-1）	重叠数据集	属性名称
早期数据集	OTB2013	—	51	0.029	71	578	3 872	30	VOT、OTB2015、TColor-128	IV、SV、OCC、DEF、MB、 FM、IPR、OPR、OV、BC、LR
	OTB2015	—	100	0.059	71	590	3 872	30	VOT、OTB2013、TColor-128
	TColor-128	—	128	0.055	71	429	3 872	30	OTB、VOT
	UAV123	—	123	0.113	109	915	3 085	30	VOT、UAV20L	ARC、BC、CM、FM、FOC、IV、 LR、OV、POC、SOB、SV、VC
	UAV20L	—	20	0.059	1 717	2 934	5 527	30	VOT、UAV123	ARC、BC、CM、FM、FOC、IV、 LR、OV、POC、SOB、SV、VC
	NfS	—	100	0.383	169	3 830	20 665	240	YouTube	IV、SV、OCC、DEF、FM、OV、 BC、LR、VC
	VOT2018	—	60	0.021 36	41	356	1 500	30	NUS-PRO^［100］、ALOV++^［101］、 OTB、TColor-128、UAV123	OCO、SCO、ARC、CM、MOC、 DEF、AM、MB、BC、IV、SV、OCC
	VOT2020	—	60	—	—	—	—	30	OTB、VOT、ALOV++，UAV123、 NUS-PRO、TColor-128、RGBT234^［102］	OCC、IV、MOC、SZ、CM
大规模数据集	TrackingNet	30 132	511	14.43	—	480	—	30	YouTube-BB	IV、SV、DEF、MB、FM、IPR、 OPR、OV、BC、LR、ARC、CM、 FOC、POC、SOB
	GOT-10K	9 335	180	1.5	29	149	1 418	10	VOT、WordNet、ImageNet	IV、SV、OCC、FM、ARC、LR
	LaSOT	1 120	280	3.87	1 000	2 502	11 397	30	YouTube、ImageNet	IV、SV、DEF、MB、FM、OV、 BC、LR、ARC、CM、FOC、POC、 VC、ROT
	TNL2K	1 300	700	1.24	21	622	18 488	30	YouTube	CM、ROT、DEF、FOC、IV、OV、 POC、VC、SV、BC、MB、ARC、 LR、FM、AS、TC、MS

算法	来源	OTB2015上属性的AUC					LaSOT	GOT-10K	帧率/ （frame·s^-1）
算法	来源	FM	MB	OV	IPR	OPR	AUC	AO	帧率/ （frame·s^-1）
SAOT	ICCV2021	0.703	0.716	0.663	0.726	0.702	0.616	0.640	29.00
AutoMatch	ICCV2021	0.721	0.732	0.687	0.714	0.705	0.583	0.652	50.00
STARK-ST50	ICCV2021	0.709	0.733	0.666	0.675	0.667	0.664	0.680	40.00
STMTrack	CVPR2021	0.729	0.740	0.667	0.730	0.707	0.606	0.642	37.00
TrDiMP	CVPR2021	0.715	0.736	0.691	0.721	0.701	0.639	0.671	26.00
TrSiam	CVPR2021	0.706	0.729	0.679	0.709	0.687	0.624	0.660	35.00
SiamGAT	CVPR2021	0.695	0.715	0.631	0.712	0.707	0.539	0.627	70.00
SiamBAN-ACM	CVPR2021	0.734	0.744	0.700	0.729	0.715	0.572	—	41.00
TransT	CVPR2021	0.720	0.744	0.684	0.694	0.674	0.649	0.671	50.00
CGACD	CVPR2020	0.702	0.713	0.636	0.722	0.704	0.518	—	70.00
SiamBAN	CVPR2020	0.687	0.698	0.640	0.717	0.687	0.514	—	40.00
SiamCAR	CVPR2020	0.703	0.715	0.661	0.703	0.679	—	0.569^*	52.27
ROAM++	CVPR2020	0.679	0.660	0.622	0.664	0.659	0.447	0.465	20.00
PrDiMP-50	CVPR2020	0.699	0.728	0.656	0.705	0.686	0.598	0.634	30.00
Siam R-CNN	CVPR2020	0.702	0.735	0.677	0.699	0.684	0.648	0.649	4.70
Ocean	ECCV2020	0.668	0.681	0.613	0.697	0.677	0.560	0.611	25.00
SiamRPN++	CVPR2019	0.686	0.703	0.646	0.694	0.680	0.496	0.517^*	35.00
SiamDW	CVPR2019	0.665	0.696	0.641	0.648	0.658	0.384	0.416	35.00
ATOM	CVPR2019	0.657	0.653	0.613	0.637	0.618	0.514	0.556^*	30.00
DiMP-50	ICCV2019	0.682	0.699	0.620	0.689	0.660	0.569	0.611	43.00
GradNet	ICCV2019	0.624	0.645	0.583	0.627	0.628	0.365	—	80.00
SiamRPN	CVPR2018	0.606	0.627	0.550	0.636	0.631	0.433*	—	160.00
SiamFC	ECCVW2016	0.579	0.586	0.469	0.565	0.558	0.336*	0.348	86.00

Survey of single target tracking algorithms based on Siamese network

基于孪生网络的单目标跟踪算法综述

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 12

References 102

Related Articles 15

Recommended Articles

Metrics

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885.
[3]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[4]	Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.
[5]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[6]	Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN. Survey of visual object tracking methods based on Transformer [J]. Journal of Computer Applications, 2024, 44(5): 1644-1654.
[7]	Zhiwen JING, Yujia ZHANG, Boting SUN, Hao GUO. Two-stage recommendation algorithm of Siamese graph convolutional neural network [J]. Journal of Computer Applications, 2024, 44(2): 469-476.
[8]	Chenhui CUI, Suzhen LIN, Dawei LI, Xiaofei LU, Jie WU. Infrared dim small target tracking method based on Siamese network and Transformer [J]. Journal of Computer Applications, 2024, 44(2): 563-571.
[9]	Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
[10]	Yongjiang LIU, Bin CHEN. Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank [J]. Journal of Computer Applications, 2024, 44(11): 3587-3594.
[11]	Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010.
[12]	Yi WANG, Jie XIE, Jia CHENG, Liwei DOU. Review of object pose estimation in RGB images based on deep learning [J]. Journal of Computer Applications, 2023, 43(8): 2546-2555.
[13]	Junjian JIANG, Dawei LIU, Yifan LIU, Yougui REN, Zhibin ZHAO. Few-shot object detection algorithm based on Siamese network [J]. Journal of Computer Applications, 2023, 43(8): 2325-2329.
[14]	Yichi CHEN, Bin CHEN. Review of lifelong learning in computer vision [J]. Journal of Computer Applications, 2023, 43(6): 1785-1795.
[15]	Yuanlong ZHAO, Yugang SHAN, Jie YUAN, Kangdi ZHAO. Object tracking based on instance segmentation and Pythagorean fuzzy decision-making [J]. Journal of Computer Applications, 2023, 43(6): 1930-1937.