Tracking appearance features based on attention self-correlation mechanism

doi:10.11772/j.issn.1001-9081.2022030426

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1248-1254.DOI: 10.11772/j.issn.1001-9081.2022030426

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Tracking appearance features based on attention self-correlation mechanism

Guangyi DOU¹^,², Fanan WEI¹(), Chuangyi QIU¹^,², Jianshu CHAO²

^1.School of Advanced Manufacturing，Fuzhou University，Quanzhou Fujian 362000，China
^2.Quanzhou Institute of Equipment Manufacturing，Haixi Institutes，Chinese Academy of Sciences，Quanzhou Fujian 362000，China

Received:2022-04-06 Revised:2022-05-31 Accepted:2022-06-10 Online:2023-04-11 Published:2023-04-10
Contact: Fanan WEI
About author:DOU Guangyi， born in 1999， M. S. candidate. His research interests include image processing， objective tracking.
QIU Chuangyi， born in 1998， M. S. candidate. His research interests include image processing， person re-identification.
CHAO Jianshu， born in 1984， Ph. D. His research interests include image processing， objective tracking.
Supported by:
National Natural Science Foundation of China(61803088)

基于注意力自相关机制的跟踪外观特征

窦光义¹^,², 魏发南¹(), 邱创一¹^,², 巢建树²

^1.福州大学先进制造学院，福建泉州 362000
^2.中国科学院海西研究院泉州装备制造研究中心，福建泉州 362000

通讯作者: 魏发南
作者简介:窦光义（1999—），男，山东德州人，硕士研究生，主要研究方向：图像处理、目标跟踪；
邱创一（1998—），男，福建福州人，硕士研究生，主要研究方向：图像处理、行人重识别；
巢建树（1984—），男，江苏江阴人，博士，主要研究方向：图像处理、目标跟踪。
基金资助:
国家自然科学基金资助项目(61803088)

Abstract

Abstract:

In order to solve the Multi-Objective Tracking （MOT） algorithms’ problems such as ID Switch （IDS） caused by fuzzy pedestrian features and verify the importance of pedestrian appearance in the tracking process， an Attention Self-Correlation Network （ASCN） based on center point detection model was proposed. Firstly， the original image was learned by channel and spatial attention networks to obtain two different feature maps， and the deep information was decoupled. Then， more accurate pedestrian appearance features and pedestrian orientation information were obtained through the autocorrelation learning between the feature maps， and this information was used to track association process. In addition， a tracking dataset of videos at low frame rate conditions was produced to verify the performance of the improved algorithm. When the video frame rate conditions were not ideal， the pedestrian appearance information was obtained by the improved algorithm through ASCN， and the algorithm had better accuracy and robustness than the algorithms only using pedestrian orientation information. Finally， the improved algorithm was tested on the MOT17 dataset of MOT Challenge. Experimental results show that compared with the FairMOT （Fairness in MOT） without adding ASCN， the improved algorithm has the Multiple Object Tracking Accuracy （MOTA） and Identification F-Score （IDF1） increased by 0.5 percentage points and 1.1 percentage points respectively， the number of IDS decreased by 32.2%， and the running speed on a single NVIDIA Tesla V100 card reached 21.2 frames per second. The above proves that the improved algorithm not only reduces the errors in the tracking process， but also improves the overall tracking performance， and can meet the real-time requirements.

Key words: deep learning, multi-objective tracking, pedestrian feature, attention mechanism, low frame rate

摘要：

为了解决多目标跟踪（MOT）算法中由于模糊行人特征造成的身份切换（IDS）等跟踪问题，并验证行人外观在跟踪过程中的重要性，提出了一种基于中心点检测模型的注意力自相关网络（ASCN）。首先，对原图进行通道和空间注意力网络的学习以获得两种不同的特征图，并对深度信息完成解耦；然后，通过特征图之间的自相关性学习，获得更加准确的行人外观特征和行人方位信息，并将这些信息用于关联过程的跟踪；此外，制作了低帧率条件下视频的跟踪数据集，以验证改进算法的性能。在视频帧率条件不理想时，改进算法利用ASCN获取了行人外观信息，相较于仅利用方位信息的跟踪算法具有更好的准确率和鲁棒性。最后，将改进算法在MOT Challenge的MOT17数据集上进行测试。实验结果表明，与不加入ASCN的FairMOT（Fairness in MOT）相比，改进算法的跟踪平均准确率（MOTA）和识别F值（IDF1）指标分别提高了0.5和1.1个百分点，IDS数减少了32.2%，且在单卡NVIDIA Tesla V100上的运行速度达到了每秒21.2帧，这验证了改进算法不仅减少了跟踪过程中的错误，也提升了整体跟踪效果，且能够满足实时性要求。

关键词: 深度学习, 多目标跟踪, 行人特征, 注意力机制, 低帧率

CLC Number:

TP391.41

Guangyi DOU, Fanan WEI, Chuangyi QIU, Jianshu CHAO. Tracking appearance features based on attention self-correlation mechanism[J]. Journal of Computer Applications, 2023, 43(4): 1248-1254.

窦光义, 魏发南, 邱创一, 巢建树. 基于注意力自相关机制的跟踪外观特征[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1248-1254.

Figures/Tables 15

Fig. 1 Attention self-correlation network structure

Tab. 1 Experimental results of different thresholds

$τ h$	$τ l$ =0.2			$τ l$ =0.1
$τ h$	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
0.3	83.4	81.3	552	82.7	81.0	553
0.4	84.1	82.0	499	83.4	81.7	499
0.5	80.9	81.2	432	80.8	81.2	428

Tab. 1 Experimental results of different thresholds

$τ h$	$τ l$ =0.2			$τ l$ =0.1
$τ h$	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
0.3	83.4	81.3	552	82.7	81.0	553
0.4	84.1	82.0	499	83.4	81.7	499
0.5	80.9	81.2	432	80.8	81.2	428

Fig. 2 Processing of associated detection bounding boxes

Tab. 2 Results of each algorithm on MOT17-06

关联信息	算法	MOTA/%	IDF1/%	IDS数
方位	TransTrack^［15］	53.7	45.0	156
	Chained-Tracker^［16］	56.1	55.2	261
	ByteTrack^［12］	60.2	58.9	249
方位+外观	RelationTrack^［17］	60.9	67.0	59
	CSTrack^［18］	61.6	63.9	168
	FairMOT^［5］	64.1	65.9	176

Tab. 3 Dataset frame number comparison

数据集	原帧率/ （frame·s^-1）	帧数
数据集	原帧率/ （frame·s^-1）	原数据集	20帧数据集	15帧数据集
总计		5 316	3 823	3 077
MOT17-02	30	600	400	300
MOT17-04	30	1 050	700	525
MOT17-05	14	837	837	837
MOT17-09	30	525	350	263
MOT17-10	30	654	436	327
MOT17-11	30	900	600	450
MOT17-13	25	750	500	375

Tab. 4 Comparison of training results

算法	MOT17_val			MOT17_test
算法	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
FairMOT	67.5	69.9	408	69.8	69.9	3 996
本文算法	70.2	72.0	305	71.1	71.4	3 276

Tab. 5 Comparative results of ByteTrack and FairMOT on datasets at different frame rates

帧率	ByteTrack			FairMOT
帧率	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
30	90.0	83.3	422	83.8	81.9	553
20	88.6	81.0	859	83.0	81.6	709
15	87.3	81.1	911	82.3	81.4	650

Tab. 6 Comparative results of different algorithms on datasets at different frame rates

跟踪算法	20帧数据集			15帧数据集
跟踪算法	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
FairMOT	83.0	81.6	709	82.3	81.4	650
FairMOT+ BYTE	83.3	82.0	649	82.5	81.6	590
本文算法	82.3	82.3	553	81.4	82.0	555

Tab. 7 Ablation study

算法	MOT17_val			MOT17_train
算法	MOTA/%	IDF1/%	IDS数	MOTA/%	IDF1/%	IDS数
baseline	67.5	69.9	408	80.8	79.1	2 100
+ASCN	68.7	72.2	370	82.4	81.2	1 713
+ASCN & BYTE	70.2	72.0	305	82.8	81.2	1 416

Tab. 8 Comparison of the proposed algorithm with SOTA

算法	MOTA/%	IDF1/%	IDS数	帧率/（frame·s^-1）
Chained-Track^［16］	66.6	57.4	5 529	6.8
CenterTrack^［13］	67.8	64.7	3 039	17.5
FairMOT^［5］	73.7	72.3	3 303	25.9
本文算法	74.2	73.4	2 238	21.2

Fig. 3 Heat map of feature extraction branch

Fig. 4 Robustness visualization of improved network

Tab. 9 Error comparison of tracking algorithms

数据集	FairMOT			本文算法
数据集	误检数	漏检数	IDS数	误检数	漏检数	IDS数
MOT17-01	383	2 289	31	71	2 352	21
MOT17-03	4 037	6 953	211	3 703	6 900	168
MOT17-07	1 050	4 832	122	486	5 198	75
MOT17-08	776	11 191	237	467	11 820	137

Fig. 5 Robustness improvement of overall algorithm on MOT17-03

Fig. 6 Robustness improvement of overall algorithm on MOT17-08

References 25

1	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149. 10.1109/tpami.2016.2577031
2	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2022-03-20］.. 10.1109/cvpr.2017.690
3	WOJKE N， BEWLEY A， PAULUS D. Simple online and realtime tracking with a deep association metric［C］// Proceedings of the 2017 IEEE International Conference on Image Processing. Piscataway： IEEE， 2017：3645-3649. 10.1109/icip.2017.8296962
4	WANG Z D， ZHENG L， LIU Y X， et al. Towards real-time multi-object tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12356. Cham： Springer， 2020： 107-122.
5	ZHANG Y F， WANG C Y， WANG X G， et al. FairMOT： on the fairness of detection and re-identification in multiple object tracking［J］. International Journal of Computer Vision， 2021， 129（11）： 3069-3087. 10.1007/s11263-021-01513-4
6	ZHOU X Y， WANG D Q， KRÄHENBÜHL P. Objects as points［EB/OL］. （2019-04-25）［2022-03-20］.. 10.5260/chara.21.2.8
7	温静，李强. 基于时空上下文信息增强的目标跟踪算法［J］. 计算机应用， 2021， 41（12）： 3565-3570. 10.11772/j.issn.1001-9081.2021061034
	WEN J， LI Q. Object tracking algorithm based on spatio-temporal context information enhancement［J］. Journal of Computer Applications， 2021， 41（12）： 3565-3570. 10.11772/j.issn.1001-9081.2021061034
8	李生武，张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪［J］. 计算机应用， 2020， 40（8）： 2219-2224.
	LI S W， ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking［J］. Journal of Computer Applications， 2020， 40（8）： 2219-2224.
9	单兆晨，黄丹丹，耿振野，等. 免锚检测的行人多目标跟踪算法［J］. 计算机工程与应用， 2022， 58（10）： 145-152. 10.3778/j.issn.1002-8331.2011-0050
	SHAN Z C， HUANG D D， GENG Z Y， et al. Pedestrian multi-object tracking algorithm of anchor-free detection［J］. Computer Engineering and Applications， 2022， 58（10）： 145-152. 10.3778/j.issn.1002-8331.2011-0050
10	BEWLEY A， GE Z Y， OTT L， et al. Simple online and realtime tracking［C］// Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway： IEEE， 2016：3464-3468. 10.1109/icip.2016.7533003
11	CHEN L， AI H Z， ZHUANG Z J， et al. Real-time multiple people tracking with deeply learned candidate selection and person re-identification［C］// Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2018：1-6. 10.1109/icme.2018.8486597
12	ZHANG Y F， SUN P Z， JIANG Y， et al. ByteTrack： multi-object tracking by associating every detection box［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13682. Cham： Springer， 2022：1-21.
13	ZHOU X Y， KOLTUN V， KRÄHENBÜHL P. Tracking objects as points［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12349. Cham： Springer， 2020：474-490.
14	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018：3-19.
15	SUN P Z， CAO J K， JIANG Y， et al. TransTrack： multiple object tracking with transformer［EB/OL］. （2021-05-04）［2022-03-20］..
16	PENG J L， WANG C A， WAN F B， et al. Chained-Tracker： chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12349. Cham： Springer， 2020：145-161.
17	YU E， LI Z L， HAN S D， et al. RelationTrack： relation-aware multiple object tracking with decoupled representation［J］. IEEE Transactions on Multimedia， 2022， 2022（Early Access）：1-1. 10.1109/tmm.2022.3150169
18	LIANG C， ZHANG Z P， ZHOU X， et al. Rethinking the competition between detection and ReID in multi-object tracking［J］. IEEE Transactions on Image Processing， 2022， 31：3182-3196. 10.1109/tip.2022.3165376
19	ESS A， LEIBE B， SCHINDLER K， et al. A mobile vision system for robust multi-person tracking［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2008：24-26. 10.1109/cvpr.2008.4587581
20	ZHANG S S， BENENSON R， SCHIELE B. CityPersons： a diverse dataset for pedestrian detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：4457-4465. 10.1109/cvpr.2017.474
21	DOLLÁR P， WOJEK C， SCHIELE B， et al. Pedestrian detection： a benchmark［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009：304-311. 10.1109/cvpr.2009.5206631
22	XIAO T， LI S， WANG B C， et al. Joint detection and identification feature learning for person search［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017：3376-3385. 10.1109/cvpr.2017.360
23	ZHONG Z， ZHENG L， ZHENG Z D， et al. CamStyle： a novel data augmentation method for person re-identification［J］. IEEE Transactions on Image Processing， 2019， 28（3）： 1176-1190. 10.1109/tip.2018.2874313
24	BERNARDIN K， STIEFELHAGEN R. Evaluating multiple object tracking performance： the clear mot metrics［J］. EURASIP Journal on Image and Video Processing， 2008， 2008： No.246309. 10.1155/2008/246309
25	RISTANI E， SOLERA F， ZOU R， et al. Performance measures and a data set for multi-target， multi-camera tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016：17-35.

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[5]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[6]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[7]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[8]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[9]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[10]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[11]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[12]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[13]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[14]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[15]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.

Tracking appearance features based on attention self-correlation mechanism

基于注意力自相关机制的跟踪外观特征

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 25

Related Articles 15

Recommended Articles

Metrics