Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (6): 1919-1929.DOI: 10.11772/j.issn.1001-9081.2022050753

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Multi-object tracking method based on dual-decoder Transformer

Li WANG1, Shibin XUAN1,2(), Xuyang QIN1, Ziwei LI1   

  1. 1.School of Artificial Intelligence,Guangxi Minzu University,Nanning Guangxi 530006,China
    2.Guangxi Key Laboratory of Hybrid Computation and IC Design and Analysis (Guangxi Minzu University),Nanning Guangxi 530006,China
  • Received:2022-05-25 Revised:2022-12-22 Accepted:2022-12-29 Online:2023-06-08 Published:2023-06-10
  • Contact: Shibin XUAN
  • About author:WANG Li, born in 1995, M. S. candidate. Her research interests include multi-object tracking, computer vision.
    QIN Xuyang, born in 1995, M. S. candidate. His research interests include object tracking, deep learning.
    LI Ziwei, born in 1997, M. S. candidate. Her research interests include semantic segmentation, computer vision.
  • Supported by:
    National Natural Science Foundation of China(6186603)


王利1, 宣士斌1,2(), 秦续阳1, 李紫薇1   

  1. 1.广西民族大学 人工智能学院,南宁 530006
    2.广西混杂计算与集成电路设计分析重点实验室(广西民族大学),南宁 530006
  • 通讯作者: 宣士斌
  • 作者简介:王利(1995—),女,四川成都人,硕士研究生,主要研究方向:多目标跟踪、计算机视觉
  • 基金资助:


The Multi-Object Tracking (MOT) task needs to track multiple objects at the same time and ensures the continuity of object identities. To solve the problems in the current MOT process, such as object occlusion, object ID Switch (IDSW) and object loss, the Transformer-based MOT model was improved, and a multi-object tracking method based on dual-decoder Transformer was proposed. Firstly, a set of trajectories was generated by model initialization in the first frame, and in each frame after the first one, attention was used to establish the association between frames. Secondly, the dual-decoder was used to correct the tracked object information. One decoder was used to detect the objects, and the other one was used to track the objects. Thirdly, the histogram template matching was applied to find the lost objects after completing the tracking. Finally, the Kalman filter was utilized to track and predict the occluded objects, and the occluded results were associated with the newly detected objects to ensure the continuity of the tracking results. In addition, on the basis of TrackFormer, the modeling of apparent statistical characteristics and motion features was added to realize the fusion between different structures. Experimental results on MOT17 dataset show that compared with TrackFormer, the proposed algorithm has the IDentity F1 Score (IDF1) increased by 0.87 percentage points, the Multiple Object Tracking Accuracy (MOTA) increased by 0.41 percentage points, and the IDSW number reduced by 16.3%. The proposed method also achieves good results on MOT16 and MOT20 datasets. Consequently, the proposed method can effectively deal with the object occlusion problem, maintain object identity information, and reduce object identity loss.

Key words: Multi-Object Tracking (MOT), attention, Transformer, histogram, template matching, Kalman filter



关键词: 多目标跟踪, 注意力, Transformer, 直方图, 模板匹配, 卡尔曼滤波

CLC Number: