基于双解码器的Transformer多目标跟踪方法

doi:10.11772/j.issn.1001-9081.2022050753

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1919-1929.DOI: 10.11772/j.issn.1001-9081.2022050753

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于双解码器的Transformer多目标跟踪方法

王利¹, 宣士斌¹^,²(), 秦续阳¹, 李紫薇¹

^1.广西民族大学人工智能学院，南宁 530006
^2.广西混杂计算与集成电路设计分析重点实验室（广西民族大学），南宁 530006

收稿日期:2022-05-25 修回日期:2022-12-22 接受日期:2022-12-29 发布日期:2023-06-08 出版日期:2023-06-10
通讯作者: 宣士斌
作者简介:王利（1995—），女，四川成都人，硕士研究生，主要研究方向：多目标跟踪、计算机视觉
宣士斌（1964—），男，安徽无为人，教授，博士，主要研究方向：图像处理与识别Email：xuanshibin@gxmzu.edu.cn
秦续阳（1995—），男，山西运城人，硕士研究生，主要研究方向：目标跟踪、深度学习
李紫薇（1997—），女，安徽淮北人，硕士研究生，主要研究方向：图像分割、计算机视觉。
基金资助:
国家自然科学基金资助项目(61866003)

Multi-object tracking method based on dual-decoder Transformer

Li WANG¹, Shibin XUAN¹^,²(), Xuyang QIN¹, Ziwei LI¹

^1.School of Artificial Intelligence，Guangxi Minzu University，Nanning Guangxi 530006，China
^2.Guangxi Key Laboratory of Hybrid Computation and IC Design and Analysis （Guangxi Minzu University），Nanning Guangxi 530006，China

Received:2022-05-25 Revised:2022-12-22 Accepted:2022-12-29 Online:2023-06-08 Published:2023-06-10
Contact: Shibin XUAN
About author:WANG Li， born in 1995， M. S. candidate. Her research interests include multi-object tracking， computer vision.
QIN Xuyang， born in 1995， M. S. candidate. His research interests include object tracking， deep learning.
LI Ziwei， born in 1997， M. S. candidate. Her research interests include semantic segmentation， computer vision.
Supported by:
National Natural Science Foundation of China(6186603)

摘要/Abstract

摘要：

多目标跟踪（MOT）任务需要同时跟踪多个目标并保证目标身份的连续性。针对当前MOT过程中存在目标遮挡、目标ID切换（IDSW）和目标丢失等问题，对基于Transformer的MOT模型进行改进，提出了一种基于双解码器的Transformer多目标跟踪方法。首先，在第一帧中通过模型初始化生成一组轨迹，并在此后的每一帧中用注意力建立帧与帧之间的关联；其次，利用双解码器修正跟踪目标信息，一个解码器用于检测目标，一个解码器用于跟踪目标；然后，完成跟踪后利用直方图模板匹配找回丢失的目标；最后，用卡尔曼滤波跟踪预测遮挡目标，并将遮挡结果与新检测出的目标关联，从而保证跟踪结果的连续性。此外，在TrackFormer的基础上添加表观统计特性和运动特征建模，以实现不同结构之间的融合。在MOT17数据集上的实验结果表明，相较于TrackFomer模型，所提模型的身份F1得分（IDF1）提升了0.87个百分点，多对象跟踪准确性（MOTA）提升了0.41个百分点，IDSW数量减少了16.3%。所提方法在MOT16和MOT20数据集上也取得了不错的成绩。可见所提方法能够有效应对物体遮挡问题，维持目标身份信息，减少目标身份丢失。

关键词: 多目标跟踪, 注意力, Transformer, 直方图, 模板匹配, 卡尔曼滤波

Abstract:

The Multi-Object Tracking （MOT） task needs to track multiple objects at the same time and ensures the continuity of object identities. To solve the problems in the current MOT process， such as object occlusion， object ID Switch （IDSW） and object loss， the Transformer-based MOT model was improved， and a multi-object tracking method based on dual-decoder Transformer was proposed. Firstly， a set of trajectories was generated by model initialization in the first frame， and in each frame after the first one， attention was used to establish the association between frames. Secondly， the dual-decoder was used to correct the tracked object information. One decoder was used to detect the objects， and the other one was used to track the objects. Thirdly， the histogram template matching was applied to find the lost objects after completing the tracking. Finally， the Kalman filter was utilized to track and predict the occluded objects， and the occluded results were associated with the newly detected objects to ensure the continuity of the tracking results. In addition， on the basis of TrackFormer， the modeling of apparent statistical characteristics and motion features was added to realize the fusion between different structures. Experimental results on MOT17 dataset show that compared with TrackFormer， the proposed algorithm has the IDentity F1 Score （IDF1） increased by 0.87 percentage points， the Multiple Object Tracking Accuracy （MOTA） increased by 0.41 percentage points， and the IDSW number reduced by 16.3%. The proposed method also achieves good results on MOT16 and MOT20 datasets. Consequently， the proposed method can effectively deal with the object occlusion problem， maintain object identity information， and reduce object identity loss.

Key words: Multi-Object Tracking (MOT), attention, Transformer, histogram, template matching, Kalman filter

中图分类号:

TP183

王利, 宣士斌, 秦续阳, 李紫薇. 基于双解码器的Transformer多目标跟踪方法[J]. 计算机应用, 2023, 43(6): 1919-1929.

Li WANG, Shibin XUAN, Xuyang QIN, Ziwei LI. Multi-object tracking method based on dual-decoder Transformer[J]. Journal of Computer Applications, 2023, 43(6): 1919-1929.

图/表 12

参考文献 45

1	ZVEJNIEKS P， BIRJUKOVS M， KLEVS M， et al. MHT-X： offline multiple hypothesis tracking with algorithm X ［J］. Experiments in Fluids， 2022， 63（3）： No.55. 10.1007/s00348-022-03399-5
2	HA N D， SHIMIZU I， BAO P T. Tracking objects based on multiple particle filters for multipart combined moving directions information［J］. Computational Intelligence and Neuroscience， 2020， 2020： No.8839725. 10.1155/2020/8839725
3	FROSSARD D， URTASUN R. End-to-end learning of multi-sensor 3D tracking by detection［C］// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2018： 635-642. 10.1109/icra.2018.8462884
4	XIANG Y， ALAHI A， SAVARESE S. Learning to track： online multi-object tracking by decision making ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 4705-4713. 10.1109/iccv.2015.534
5	LI Z X， BILODEAU G A， BOUACHIR W. Multiple convolutional features in siamese networks for object tracking ［J］. Machine Vision and Applications， 2021， 32（3）： No.59. 10.1007/s00138-021-01185-7
6	TANG S Y， ANDRES B， ANDRILUKA M， et al. Subgraph decomposition for multi-target tracking［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 5033-5041. 10.1109/cvpr.2015.7299138
7	CHU Q， OUYANG W L， LI H S， et al. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 4846-4855. 10.1109/iccv.2017.518
8	SADEGHIAN A， ALAHI A， SAVARESE S. Tracking the untrackable： learning to track multiple cues with long-term dependencies［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 300-311. 10.1109/iccv.2017.41
9	BREITENSTEIN M D， REICHLIN F， LEIBE B， et al. Online multiperson tracking-by-detection from a single， uncalibrated camera［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2011， 33（9）： 1820-1833. 10.1109/tpami.2010.232
10	SHARMA S， ANSARI J A， MURTHY J K， et al. Beyond pixels： leveraging geometry and shape cues for online multi-object tracking［C］// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway： IEEE， 2018： 3508-3515. 10.1109/icra.2018.8461018
11	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
12	SUN P Z， CAO J K， JIANG Y， et al. TransTrack： multiple object tracking with transformer ［EB/OL］. （2021-05-04）［2022-04-12］..
13	MEINHARDT T， KIRILLOV A， LEAL-TAIXÉ L， et al. TrackFormer： multi-object tracking with Transformers［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 8834-8844. 10.1109/cvpr52688.2022.00864
14	XU Y H， BAN Y T， DELORME G， et al. TransCenter： transformers with dense queries for multiple-object tracking［EB/OL］. （2022-09-30）［2022-10-11］.. 10.1109/tpami.2022.3225078
15	SHENG H， ZHANG Y， CHEN J H， et al. Heterogeneous association graph fusion for target association in multiple object tracking ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2019， 29（11）： 3269-3280. 10.1109/tcsvt.2018.2882192
16	KIM C， LI F X， CIPTADI A， et al. Multiple hypothesis tracking revisited［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 4696-4704. 10.1109/iccv.2015.533
17	KEUPER M， TANG S Y， ANDRES B， et al. Motion segmentation & multiple object tracking by correlation co-clustering ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（1）： 140-153. 10.1109/tpami.2018.2876253
18	YU Q， MEDIONI G， COHEN I. Multiple target tracking using spatio-temporal Markov chain Monte Carlo data association［C］// Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2007： 1-8. 10.1109/cvpr.2007.382991
19	TANG S Y， ANDRILUKA M， ANDRES B， et al. Multiple people tracking by lifted multicut and person re-identification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3701-3710. 10.1109/cvpr.2017.394
20	BRASÓ G， LEAL-TAIXÉ L. Learning a neural solver for multiple object tracking［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6246-6256. 10.1109/cvpr42600.2020.00628
21	LEAL-TAIXÉ L， CANTON-FERRER C， SCHINDLER K. Learning by tracking： Siamese CNN for robust target association［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2016： 418-425. 10.1109/cvprw.2016.59
22	RISTANI E， TOMASI C. Features for multi-target multi-camera tracking and re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6036-6046. 10.1109/cvpr.2018.00632
23	CHEN L， AI H Z， ZHUANG Z J， et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification［C］// Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2018： 1-6. 10.1109/icme.2018.8486597
24	CHU P， LING H B. FAMNeT： joint learning of feature， affinity and multi-dimensional assignment for online multiple object tracking［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6172-6181. 10.1109/iccv.2019.00627
25	BERGMANN P， MEINHARDT T， LEAL-TAIXÉ L. Tracking without bells and whistles ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 941-951. 10.1109/iccv.2019.00103
26	ZHOU X Y， KOLTUN V， KRÄHENBÜHL P. Tracking objects as points［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12349. Cham： Springer， 2020： 474-490.
27	HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN ［C］//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988. 10.1109/iccv.2017.322
28	VOIGTLAENDER P， KRAUSE M， OSEP A， et al. MOTS： multi-object tracking and segmentation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7934-7943. 10.1109/cvpr.2019.00813
29	PORZI L， HOFINGER M， RUIZ I， et al. Learning multi-object tracking and segmentation from automatic annotations［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6845-6854. 10.1109/cvpr42600.2020.00688
30	XU Z B， ZHANG W， TAN X， et al. Segment as points for efficient online multi-object tracking and segmentation［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 264-281.
31	ZHU X Z， SU W J， LU LW， et al. Deformable DETR： deformable Transformers for end-to-end object detection ［EB/OL］. （2021-03-18）［2022-04-12］.. 10.1609/aaai.v36i1.19893
32	MILAN A， LEAL-TAIXÉ L， REID I， et al. MOT16： a benchmark for multi-object tracking［EB/OL］. （2016-05-03）［2022-04-12］..
33	DENDORFER P， REZATOFIGHI H， MILAN A， et al. MOT20： a benchmark for multi object tracking in crowded scenes ［EB/OL］. （2020-03-19）［2022-04-12］..
34	BERNARDIN K， STIEFELHAGEN R. Evaluating multiple object tracking performance： the CLEAR MOT metrics［J］. EURASIP Journal on Image and Video Processing， 2008， 2008： No.246309. 10.1155/2008/246309
35	MAHMOUDI N， AHADI S M， RAHMATI M. Multi-target tracking using CNN-based features： CNNMTT［J］. Multimedia Tools and Applications， 2019， 78（6）： 7077-7096. 10.1007/s11042-018-6467-6
36	WOJKE N， BEWLEY A， PAULUS D. Simple online and realtime tracking with a deep association metric［C］// Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway： IEEE， 2017： 3645-3649. 10.1109/icip.2017.8296962
37	CHOI W. Near-online multi-target tracking with aggregated local flow descriptor［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 3029-3037. 10.1109/iccv.2015.347
38	HORNAKOVA A， HENSCHEL R， ROSENHAHN B， et al. Lifted disjoint paths with application in multiple object tracking［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 4364-4375. 10.51202/9783186875105-130
39	YU F W， LI W B， LI Q Q， et al. POI： multiple object tracking with high performance detection and appearance feature［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 36-42.
40	FANG K， XIANG Y， LI X C， et al. Recurrent autoregressive networks for online multi-object tracking ［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 466-475. 10.1109/wacv.2018.00057
41	WANG Z D， ZHENG L， LIU Y X， et al. Towards real-time multi-object tracking ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12356. Cham： Springer， 2020： 107-122.
42	PENG J L， WANG C A， WAN F B， et al. Chained-tracker： chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12349. Cham： Springer， 2020： 145-161.
43	PANG B， LI Y Z， ZHANG Y F， et al. TubeTK： adopting tubes to track multi-object in a one-step training model ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 6307-6317. 10.1109/cvpr42600.2020.00634
44	BEWLEY A， GE Z Y， OTT L， et al. Simple online and realtime tracking［C］// Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway： IEEE， 2016： 3464-3468. 10.1109/icip.2016.7533003
45	ZHANG Y， SHENG H， WU Y B， et al. Multiplex labeling graph for near-online tracking in crowded scenes ［J］. IEEE Internet of Things Journal， 2020， 7（9）： 7892-7902. 10.1109/jiot.2020.2996609

数据集	方法	IDF1/%↑	MOTA/%↑	ML/%↓	MT/%↑	IDSW↓
MOT16	MPNTrack	58.60	51.50	25.90	31.20	375
	CNNMTT	62.20	65.20	21.30	34.00	946
	DeepSORT	62.20	61.40	18.20	32.80	1 423
	NOMTwSDP16	62.60	62.20	31.10	32.50	406
	Lif_T	64.70	61.30	34.00	27.00	1 389
	POI	65.10	66.10	20.80	39.00	3 093
	RAR16wVGG	63.80	63.00	22.10	39.90	482
	JDE	55.80	64.40	20.00	35.40	1 544
	本文方法	65.13	65.39	15.80	34.00	971
MOT17	TransTrack	56.90	65.80	21.80	32.20	5 355
	CTracker	57.40	66.60	24.20	32.22	5 529
	TubeTK	58.60	63.00	19.87	31.21	4 137
	TrackFormer	63.90	65.00	13.76	45.60	3 528
	CenterTrack	64.70	67.80	24.58	34.65	3 039
	本文方法	64.77	65.41	13.63	45.22	2952
MOT20	SORT20	45.10	42.70	26.20	16.70	4 334
	MLT	48.90	54.60	22.10	30.90	2 187
	Tracktor++V2	52.70	52.60	26.70	29.40	1 648
	本文方法	53.69	55.26	23.90	36.00	1169

数据集	方法	IDF1/%↑	MOTA/%↑	ML/%↓	MT/%↑	IDSW↓
MOT16	MPNTrack	58.60	51.50	25.90	31.20	375
	CNNMTT	62.20	65.20	21.30	34.00	946
	DeepSORT	62.20	61.40	18.20	32.80	1 423
	NOMTwSDP16	62.60	62.20	31.10	32.50	406
	Lif_T	64.70	61.30	34.00	27.00	1 389
	POI	65.10	66.10	20.80	39.00	3 093
	RAR16wVGG	63.80	63.00	22.10	39.90	482
	JDE	55.80	64.40	20.00	35.40	1 544
	本文方法	65.13	65.39	15.80	34.00	971
MOT17	TransTrack	56.90	65.80	21.80	32.20	5 355
	CTracker	57.40	66.60	24.20	32.22	5 529
	TubeTK	58.60	63.00	19.87	31.21	4 137
	TrackFormer	63.90	65.00	13.76	45.60	3 528
	CenterTrack	64.70	67.80	24.58	34.65	3 039
	本文方法	64.77	65.41	13.63	45.22	2952
MOT20	SORT20	45.10	42.70	26.20	16.70	4 334
	MLT	48.90	54.60	22.10	30.90	2 187
	Tracktor++V2	52.70	52.60	26.70	29.40	1 648
	本文方法	53.69	55.26	23.90	36.00	1169

方法	MOTA/%	IDF1/%	IDSW
TrackFormer	68.1	67.5	2 097
TrackFormer+Kalman （未判断摄像头的运动）	67.5	64.5	2 172
TrackFormer+Kalman （判断摄像头不动）	68.6	67.8	2 013
TrackFormer+直方图匹配	68.3	69.2	1 689
TrackFormer+双解码器	68.9	67.6	1 794
all	68.9	69.7	1 650

方法	MOTA/%	IDF1/%	IDSW
TrackFormer	68.1	67.5	2 097
TrackFormer+Kalman （未判断摄像头的运动）	67.5	64.5	2 172
TrackFormer+Kalman （判断摄像头不动）	68.6	67.8	2 013
TrackFormer+直方图匹配	68.3	69.2	1 689
TrackFormer+双解码器	68.9	67.6	1 794
all	68.9	69.7	1 650

任务	MOTA/%	IDF1/%	IDSW
本文方法	68.9	69.7	1 650
噪声	68.8	69.2	1 443
扭曲	68.1	68.8	2 003
平移	34.3	28.8	6 338

基于双解码器的Transformer多目标跟踪方法

Multi-object tracking method based on dual-decoder Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 45

相关文章 15

编辑推荐

Metrics

[1]	张奕, 王真梅. 图自动编码器上二阶段融合实现的环状RNA-疾病关联预测[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1979-1986.
[2]	李文举, 李梦颖, 崔柳, 储王慧, 张益, 高慧. 基于金字塔分割注意力网络的单目深度估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1736-1742.
[3]	孙男男, 朴春慧, 马新娜. 基于社交关系和时序信息的团购推荐方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1719-1729.
[4]	郑智雄, 刘建华, 孙水华, 徐戈, 林鸿辉. 融合多窗口局部信息的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1796-1802.
[5]	王辉, 李建红. 基于Transformer的三维模型小样本识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1750-1758.
[6]	方可, 刘蓉, 魏驰宇, 张心月, 刘杨. 复杂场景下的行人跌倒检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1811-1817.
[7]	鲁斌, 柳杰林. 基于特征增强的三维点云语义分割[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1818-1825.
[8]	张慧斌, 冯丽萍, 郝耀军, 王一宁. 基于注意力机制和迁移学习的古壁画朝代识别[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1826-1832.
[9]	侯志荣, 范晓东, 张华, 马晓楠. J-SGPGN：基于序列与图的联合学习复述生成网络[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1365-1371.
[10]	张广耀, 宋纯锋. 融合人体全身表观特征的行人头部跟踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1372-1377.
[11]	隋佳宏, 毛莺池, 于慧敏, 王子成, 平萍. 基于图注意力网络的全局图像描述生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1409-1415.
[12]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.
[13]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.
[14]	丁正凯, 傅启明, 陈建平, 陆悠, 吴宏杰, 方能炜, 邢镔. 结合注意力机制与深度强化学习的超短期光伏功率预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1647-1654.
[15]	刘辉, 张琳玉, 王复港, 何如瑾. 基于注意力机制和上下文信息的目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1557-1564.