Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (6): 1919-1929.DOI: 10.11772/j.issn.1001-9081.2022050753
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Li WANG1, Shibin XUAN1,2(), Xuyang QIN1, Ziwei LI1
Received:
2022-05-25
Revised:
2022-12-22
Accepted:
2022-12-29
Online:
2023-06-08
Published:
2023-06-10
Contact:
Shibin XUAN
About author:
WANG Li, born in 1995, M. S. candidate. Her research interests include multi-object tracking, computer vision.Supported by:
通讯作者:
宣士斌
作者简介:
王利(1995—),女,四川成都人,硕士研究生,主要研究方向:多目标跟踪、计算机视觉基金资助:
CLC Number:
Li WANG, Shibin XUAN, Xuyang QIN, Ziwei LI. Multi-object tracking method based on dual-decoder Transformer[J]. Journal of Computer Applications, 2023, 43(6): 1919-1929.
王利, 宣士斌, 秦续阳, 李紫薇. 基于双解码器的Transformer多目标跟踪方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1919-1929.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022050753
数据集 | 方法 | IDF1/%↑ | MOTA/%↑ | ML/%↓ | MT/%↑ | IDSW↓ |
---|---|---|---|---|---|---|
MOT16 | MPNTrack | 58.60 | 51.50 | 25.90 | 31.20 | 375 |
CNNMTT | 62.20 | 65.20 | 21.30 | 34.00 | 946 | |
DeepSORT | 62.20 | 61.40 | 18.20 | 32.80 | 1 423 | |
NOMTwSDP16 | 62.60 | 62.20 | 31.10 | 32.50 | 406 | |
Lif_T | 64.70 | 61.30 | 34.00 | 27.00 | 1 389 | |
POI | 65.10 | 66.10 | 20.80 | 39.00 | 3 093 | |
RAR16wVGG | 63.80 | 63.00 | 22.10 | 39.90 | 482 | |
JDE | 55.80 | 64.40 | 20.00 | 35.40 | 1 544 | |
本文方法 | 65.13 | 65.39 | 15.80 | 34.00 | 971 | |
MOT17 | TransTrack | 56.90 | 65.80 | 21.80 | 32.20 | 5 355 |
CTracker | 57.40 | 66.60 | 24.20 | 32.22 | 5 529 | |
TubeTK | 58.60 | 63.00 | 19.87 | 31.21 | 4 137 | |
TrackFormer | 63.90 | 65.00 | 13.76 | 45.60 | 3 528 | |
CenterTrack | 64.70 | 67.80 | 24.58 | 34.65 | 3 039 | |
本文方法 | 64.77 | 65.41 | 13.63 | 45.22 | 2952 | |
MOT20 | SORT20 | 45.10 | 42.70 | 26.20 | 16.70 | 4 334 |
MLT | 48.90 | 54.60 | 22.10 | 30.90 | 2 187 | |
Tracktor++V2 | 52.70 | 52.60 | 26.70 | 29.40 | 1 648 | |
本文方法 | 53.69 | 55.26 | 23.90 | 36.00 | 1169 |
Tab. 1 Comparison of the proposed method with state-of-the-art methods on each dataset
数据集 | 方法 | IDF1/%↑ | MOTA/%↑ | ML/%↓ | MT/%↑ | IDSW↓ |
---|---|---|---|---|---|---|
MOT16 | MPNTrack | 58.60 | 51.50 | 25.90 | 31.20 | 375 |
CNNMTT | 62.20 | 65.20 | 21.30 | 34.00 | 946 | |
DeepSORT | 62.20 | 61.40 | 18.20 | 32.80 | 1 423 | |
NOMTwSDP16 | 62.60 | 62.20 | 31.10 | 32.50 | 406 | |
Lif_T | 64.70 | 61.30 | 34.00 | 27.00 | 1 389 | |
POI | 65.10 | 66.10 | 20.80 | 39.00 | 3 093 | |
RAR16wVGG | 63.80 | 63.00 | 22.10 | 39.90 | 482 | |
JDE | 55.80 | 64.40 | 20.00 | 35.40 | 1 544 | |
本文方法 | 65.13 | 65.39 | 15.80 | 34.00 | 971 | |
MOT17 | TransTrack | 56.90 | 65.80 | 21.80 | 32.20 | 5 355 |
CTracker | 57.40 | 66.60 | 24.20 | 32.22 | 5 529 | |
TubeTK | 58.60 | 63.00 | 19.87 | 31.21 | 4 137 | |
TrackFormer | 63.90 | 65.00 | 13.76 | 45.60 | 3 528 | |
CenterTrack | 64.70 | 67.80 | 24.58 | 34.65 | 3 039 | |
本文方法 | 64.77 | 65.41 | 13.63 | 45.22 | 2952 | |
MOT20 | SORT20 | 45.10 | 42.70 | 26.20 | 16.70 | 4 334 |
MLT | 48.90 | 54.60 | 22.10 | 30.90 | 2 187 | |
Tracktor++V2 | 52.70 | 52.60 | 26.70 | 29.40 | 1 648 | |
本文方法 | 53.69 | 55.26 | 23.90 | 36.00 | 1169 |
方法 | MOTA/% | IDF1/% | IDSW |
---|---|---|---|
TrackFormer | 68.1 | 67.5 | 2 097 |
TrackFormer+Kalman (未判断摄像头的运动) | 67.5 | 64.5 | 2 172 |
TrackFormer+Kalman (判断摄像头不动) | 68.6 | 67.8 | 2 013 |
TrackFormer+直方图匹配 | 68.3 | 69.2 | 1 689 |
TrackFormer+双解码器 | 68.9 | 67.6 | 1 794 |
all | 68.9 | 69.7 | 1 650 |
Tab. 2 Ablation experimental results of each module
方法 | MOTA/% | IDF1/% | IDSW |
---|---|---|---|
TrackFormer | 68.1 | 67.5 | 2 097 |
TrackFormer+Kalman (未判断摄像头的运动) | 67.5 | 64.5 | 2 172 |
TrackFormer+Kalman (判断摄像头不动) | 68.6 | 67.8 | 2 013 |
TrackFormer+直方图匹配 | 68.3 | 69.2 | 1 689 |
TrackFormer+双解码器 | 68.9 | 67.6 | 1 794 |
all | 68.9 | 69.7 | 1 650 |
任务 | MOTA/% | IDF1/% | IDSW |
---|---|---|---|
本文方法 | 68.9 | 69.7 | 1 650 |
噪声 | 68.8 | 69.2 | 1 443 |
扭曲 | 68.1 | 68.8 | 2 003 |
平移 | 34.3 | 28.8 | 6 338 |
Tab. 3 Robustness experimental results on MOT17 dataset
任务 | MOTA/% | IDF1/% | IDSW |
---|---|---|---|
本文方法 | 68.9 | 69.7 | 1 650 |
噪声 | 68.8 | 69.2 | 1 443 |
扭曲 | 68.1 | 68.8 | 2 003 |
平移 | 34.3 | 28.8 | 6 338 |
1 | ZVEJNIEKS P, BIRJUKOVS M, KLEVS M, et al. MHT-X: offline multiple hypothesis tracking with algorithm X [J]. Experiments in Fluids, 2022, 63(3): No.55. 10.1007/s00348-022-03399-5 |
2 | HA N D, SHIMIZU I, BAO P T. Tracking objects based on multiple particle filters for multipart combined moving directions information[J]. Computational Intelligence and Neuroscience, 2020, 2020: No.8839725. 10.1155/2020/8839725 |
3 | FROSSARD D, URTASUN R. End-to-end learning of multi-sensor 3D tracking by detection[C]// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2018: 635-642. 10.1109/icra.2018.8462884 |
4 | XIANG Y, ALAHI A, SAVARESE S. Learning to track: online multi-object tracking by decision making [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 4705-4713. 10.1109/iccv.2015.534 |
5 | LI Z X, BILODEAU G A, BOUACHIR W. Multiple convolutional features in siamese networks for object tracking [J]. Machine Vision and Applications, 2021, 32(3): No.59. 10.1007/s00138-021-01185-7 |
6 | TANG S Y, ANDRES B, ANDRILUKA M, et al. Subgraph decomposition for multi-target tracking[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 5033-5041. 10.1109/cvpr.2015.7299138 |
7 | CHU Q, OUYANG W L, LI H S, et al. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 4846-4855. 10.1109/iccv.2017.518 |
8 | SADEGHIAN A, ALAHI A, SAVARESE S. Tracking the untrackable: learning to track multiple cues with long-term dependencies[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 300-311. 10.1109/iccv.2017.41 |
9 | BREITENSTEIN M D, REICHLIN F, LEIBE B, et al. Online multiperson tracking-by-detection from a single, uncalibrated camera[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(9): 1820-1833. 10.1109/tpami.2010.232 |
10 | SHARMA S, ANSARI J A, MURTHY J K, et al. Beyond pixels: leveraging geometry and shape cues for online multi-object tracking[C]// Proceedings of the 2018 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2018: 3508-3515. 10.1109/icra.2018.8461018 |
11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
12 | SUN P Z, CAO J K, JIANG Y, et al. TransTrack: multiple object tracking with transformer [EB/OL]. (2021-05-04) [2022-04-12].. |
13 | MEINHARDT T, KIRILLOV A, LEAL-TAIXÉ L, et al. TrackFormer: multi-object tracking with Transformers[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 8834-8844. 10.1109/cvpr52688.2022.00864 |
14 | XU Y H, BAN Y T, DELORME G, et al. TransCenter: transformers with dense queries for multiple-object tracking[EB/OL]. (2022-09-30) [2022-10-11].. 10.1109/tpami.2022.3225078 |
15 | SHENG H, ZHANG Y, CHEN J H, et al. Heterogeneous association graph fusion for target association in multiple object tracking [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(11): 3269-3280. 10.1109/tcsvt.2018.2882192 |
16 | KIM C, LI F X, CIPTADI A, et al. Multiple hypothesis tracking revisited[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 4696-4704. 10.1109/iccv.2015.533 |
17 | KEUPER M, TANG S Y, ANDRES B, et al. Motion segmentation & multiple object tracking by correlation co-clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(1): 140-153. 10.1109/tpami.2018.2876253 |
18 | YU Q, MEDIONI G, COHEN I. Multiple target tracking using spatio-temporal Markov chain Monte Carlo data association[C]// Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2007: 1-8. 10.1109/cvpr.2007.382991 |
19 | TANG S Y, ANDRILUKA M, ANDRES B, et al. Multiple people tracking by lifted multicut and person re-identification[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3701-3710. 10.1109/cvpr.2017.394 |
20 | BRASÓ G, LEAL-TAIXÉ L. Learning a neural solver for multiple object tracking[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6246-6256. 10.1109/cvpr42600.2020.00628 |
21 | LEAL-TAIXÉ L, CANTON-FERRER C, SCHINDLER K. Learning by tracking: Siamese CNN for robust target association[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2016: 418-425. 10.1109/cvprw.2016.59 |
22 | RISTANI E, TOMASI C. Features for multi-target multi-camera tracking and re-identification [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6036-6046. 10.1109/cvpr.2018.00632 |
23 | CHEN L, AI H Z, ZHUANG Z J, et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification[C]// Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2018: 1-6. 10.1109/icme.2018.8486597 |
24 | CHU P, LING H B. FAMNeT: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 6172-6181. 10.1109/iccv.2019.00627 |
25 | BERGMANN P, MEINHARDT T, LEAL-TAIXÉ L. Tracking without bells and whistles [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 941-951. 10.1109/iccv.2019.00103 |
26 | ZHOU X Y, KOLTUN V, KRÄHENBÜHL P. Tracking objects as points[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12349. Cham: Springer, 2020: 474-490. |
27 | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. 10.1109/iccv.2017.322 |
28 | VOIGTLAENDER P, KRAUSE M, OSEP A, et al. MOTS: multi-object tracking and segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7934-7943. 10.1109/cvpr.2019.00813 |
29 | PORZI L, HOFINGER M, RUIZ I, et al. Learning multi-object tracking and segmentation from automatic annotations[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6845-6854. 10.1109/cvpr42600.2020.00688 |
30 | XU Z B, ZHANG W, TAN X, et al. Segment as points for efficient online multi-object tracking and segmentation[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 264-281. |
31 | ZHU X Z, SU W J, LU LW, et al. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. (2021-03-18) [2022-04-12].. 10.1609/aaai.v36i1.19893 |
32 | MILAN A, LEAL-TAIXÉ L, REID I, et al. MOT16: a benchmark for multi-object tracking[EB/OL]. (2016-05-03) [2022-04-12].. |
33 | DENDORFER P, REZATOFIGHI H, MILAN A, et al. MOT20: a benchmark for multi object tracking in crowded scenes [EB/OL]. (2020-03-19) [2022-04-12].. |
34 | BERNARDIN K, STIEFELHAGEN R. Evaluating multiple object tracking performance: the CLEAR MOT metrics[J]. EURASIP Journal on Image and Video Processing, 2008, 2008: No.246309. 10.1155/2008/246309 |
35 | MAHMOUDI N, AHADI S M, RAHMATI M. Multi-target tracking using CNN-based features: CNNMTT[J]. Multimedia Tools and Applications, 2019, 78(6): 7077-7096. 10.1007/s11042-018-6467-6 |
36 | WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]// Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway: IEEE, 2017: 3645-3649. 10.1109/icip.2017.8296962 |
37 | CHOI W. Near-online multi-target tracking with aggregated local flow descriptor[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 3029-3037. 10.1109/iccv.2015.347 |
38 | HORNAKOVA A, HENSCHEL R, ROSENHAHN B, et al. Lifted disjoint paths with application in multiple object tracking[C]// Proceedings of the 37th International Conference on Machine Learning. New York: JMLR.org, 2020: 4364-4375. 10.51202/9783186875105-130 |
39 | YU F W, LI W B, LI Q Q, et al. POI: multiple object tracking with high performance detection and appearance feature[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9914. Cham: Springer, 2016: 36-42. |
40 | FANG K, XIANG Y, LI X C, et al. Recurrent autoregressive networks for online multi-object tracking [C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2018: 466-475. 10.1109/wacv.2018.00057 |
41 | WANG Z D, ZHENG L, LIU Y X, et al. Towards real-time multi-object tracking [C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12356. Cham: Springer, 2020: 107-122. |
42 | PENG J L, WANG C A, WAN F B, et al. Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12349. Cham: Springer, 2020: 145-161. |
43 | PANG B, LI Y Z, ZHANG Y F, et al. TubeTK: adopting tubes to track multi-object in a one-step training model [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6307-6317. 10.1109/cvpr42600.2020.00634 |
44 | BEWLEY A, GE Z Y, OTT L, et al. Simple online and realtime tracking[C]// Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway: IEEE, 2016: 3464-3468. 10.1109/icip.2016.7533003 |
45 | ZHANG Y, SHENG H, WU Y B, et al. Multiplex labeling graph for near-online tracking in crowded scenes [J]. IEEE Internet of Things Journal, 2020, 7(9): 7892-7902. 10.1109/jiot.2020.2996609 |
[1] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[2] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[3] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[4] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[5] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[6] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[7] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[8] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[9] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[10] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[11] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[12] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[13] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[14] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[15] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||