Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1248-1254.DOI: 10.11772/j.issn.1001-9081.2022030426

• Multimedia computing and computer simulation • Previous Articles    

Tracking appearance features based on attention self-correlation mechanism

Guangyi DOU1,2, Fanan WEI1(), Chuangyi QIU1,2, Jianshu CHAO2   

  1. 1.School of Advanced Manufacturing,Fuzhou University,Quanzhou Fujian 362000,China
    2.Quanzhou Institute of Equipment Manufacturing,Haixi Institutes,Chinese Academy of Sciences,Quanzhou Fujian 362000,China
  • Received:2022-04-06 Revised:2022-05-31 Accepted:2022-06-10 Online:2023-04-11 Published:2023-04-10
  • Contact: Fanan WEI
  • About author:DOU Guangyi, born in 1999, M. S. candidate. His research interests include image processing, objective tracking.
    QIU Chuangyi, born in 1998, M. S. candidate. His research interests include image processing, person re-identification.
    CHAO Jianshu, born in 1984, Ph. D. His research interests include image processing, objective tracking.
  • Supported by:
    National Natural Science Foundation of China(61803088)

基于注意力自相关机制的跟踪外观特征

窦光义1,2, 魏发南1(), 邱创一1,2, 巢建树2   

  1. 1.福州大学 先进制造学院,福建 泉州 362000
    2.中国科学院海西研究院 泉州装备制造研究中心,福建 泉州 362000
  • 通讯作者: 魏发南
  • 作者简介:窦光义(1999—),男,山东德州人,硕士研究生,主要研究方向:图像处理、目标跟踪;
    邱创一(1998—),男,福建福州人,硕士研究生,主要研究方向:图像处理、行人重识别;
    巢建树(1984—),男,江苏江阴人,博士,主要研究方向:图像处理、目标跟踪。
  • 基金资助:
    国家自然科学基金资助项目(61803088)

Abstract:

In order to solve the Multi-Objective Tracking (MOT) algorithms’ problems such as ID Switch (IDS) caused by fuzzy pedestrian features and verify the importance of pedestrian appearance in the tracking process, an Attention Self-Correlation Network (ASCN) based on center point detection model was proposed. Firstly, the original image was learned by channel and spatial attention networks to obtain two different feature maps, and the deep information was decoupled. Then, more accurate pedestrian appearance features and pedestrian orientation information were obtained through the autocorrelation learning between the feature maps, and this information was used to track association process. In addition, a tracking dataset of videos at low frame rate conditions was produced to verify the performance of the improved algorithm. When the video frame rate conditions were not ideal, the pedestrian appearance information was obtained by the improved algorithm through ASCN, and the algorithm had better accuracy and robustness than the algorithms only using pedestrian orientation information. Finally, the improved algorithm was tested on the MOT17 dataset of MOT Challenge. Experimental results show that compared with the FairMOT (Fairness in MOT) without adding ASCN, the improved algorithm has the Multiple Object Tracking Accuracy (MOTA) and Identification F-Score (IDF1) increased by 0.5 percentage points and 1.1 percentage points respectively, the number of IDS decreased by 32.2%, and the running speed on a single NVIDIA Tesla V100 card reached 21.2 frames per second. The above proves that the improved algorithm not only reduces the errors in the tracking process, but also improves the overall tracking performance, and can meet the real-time requirements.

Key words: deep learning, multi-objective tracking, pedestrian feature, attention mechanism, low frame rate

摘要:

为了解决多目标跟踪(MOT)算法中由于模糊行人特征造成的身份切换(IDS)等跟踪问题,并验证行人外观在跟踪过程中的重要性,提出了一种基于中心点检测模型的注意力自相关网络(ASCN)。首先,对原图进行通道和空间注意力网络的学习以获得两种不同的特征图,并对深度信息完成解耦;然后,通过特征图之间的自相关性学习,获得更加准确的行人外观特征和行人方位信息,并将这些信息用于关联过程的跟踪;此外,制作了低帧率条件下视频的跟踪数据集,以验证改进算法的性能。在视频帧率条件不理想时,改进算法利用ASCN获取了行人外观信息,相较于仅利用方位信息的跟踪算法具有更好的准确率和鲁棒性。最后,将改进算法在MOT Challenge的MOT17数据集上进行测试。实验结果表明,与不加入ASCN的FairMOT(Fairness in MOT)相比,改进算法的跟踪平均准确率(MOTA)和识别F值(IDF1)指标分别提高了0.5和1.1个百分点,IDS数减少了32.2%,且在单卡NVIDIA Tesla V100上的运行速度达到了每秒21.2帧,这验证了改进算法不仅减少了跟踪过程中的错误,也提升了整体跟踪效果,且能够满足实时性要求。

关键词: 深度学习, 多目标跟踪, 行人特征, 注意力机制, 低帧率

CLC Number: