Object tracking algorithm based on spatio-temporal context information enhancement

doi:10.11772/j.issn.1001-9081.2021061034

Abstract

Abstract:

Making full use of the spatio-temporal context information in the video can significantly improve the performance of object tracking， but most of the current object tracking algorithms based on deep learning only use the feature information of the current frame to locate the object， without using the spatio-temporal context information of the same object in the video frames before and after the current frame， which leads to the tracking object being susceptible to the interference from the similar object nearby， so a potential cumulative error will be introduced during tracking and locating. In order to retain spatio-temporal context information， a short-term memory storage pool was introduced based on SiamMask algorithm to store features of the historical frames； meanwhile， an Appearance Saliency Boosting Module （ASBM） was proposed， which not only enhanced the saliency features of the tracking object， but also suppressed the interference from similar object around the tracking object. On the basis of the above， an object tracking algorithm based on spatio-temporal context information enhancement was proposed. To verify the performance of the proposed algorithm， experiments were carried out on four datasets， including VOT2016， VOT2018， DAVIS-2016 and DAVIS-2017. Experimental results show that compared with SiamMask algorithm， the proposed algorithm has the accuracy and Expected Average Overlap rate （EAO） increased by 4 percentage points and 2 percentage points respectively on VOT2016 dataset， and has the accuracy， robustness and EAO improved by 3.7 percentage points， 2.8 percentage points and 1 percentage point respectively on VOT2018 dataset， and has the decay of the regional similarity and contour accuracy indicators on DAVIS-2016 datasets both reduced by 0.2 percentage points， and has the decay of the regional similarity and contour progress indicators on DAVIS-2017 datasets reduced by 1.3 and 0.9 percentage points respectively.

Key words: object tracking, context information, salient feature, feature enhancement, deep learning

摘要：

充分利用视频中的时空上下文信息能明显提高目标跟踪性能，但目前大多数基于深度学习的目标跟踪算法仅利用当前帧的特征信息来定位目标，没有利用同一目标在视频前后帧的时空上下文特征信息，导致跟踪目标易受到邻近相似目标的干扰，从而在跟踪定位时会引入一个潜在的累计误差。为了保留时空上下文信息，在SiamMask算法的基础上引入一个短期记忆存储池来存储历史帧特征；同时，提出了外观显著性增强模块（ASBM），一方面增强跟踪目标的显著性特征，另一方面抑制周围相似目标对目标的干扰。基于此，提出一种基于时空上下文信息增强的目标跟踪算法。在VOT2016、VOT2018、DAVIS-2016和DAVIS-2017等四个数据集上进行实验与分析，结果表明所提出的算法相较于SiamMask算法在VOT2016上的准确率和平均重叠率（EAO）分别提升了4个百分点和2个百分点；在VOT2018上的准确率、鲁棒性和EAO分别提升了3.7个百分点、2.8个百分点和1个百分点；在DAVIS-2016上的区域相似度、轮廓精度指标中的下降率均分别降低了0.2个百分点；在DAVIS-2017上的区域相似度、轮廓精度指标中的下降率分别降低了1.3和0.9个百分点。

关键词: 目标跟踪, 上下文信息, 显著特征, 特征增强, 深度学习

CLC Number:

TP391.413

Jing WEN, Qiang LI. Object tracking algorithm based on spatio-temporal context information enhancement[J]. Journal of Computer Applications, 2021, 41(12): 3565-3570.

温静, 李强. 基于时空上下文信息增强的目标跟踪算法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3565-3570.

Figures/Tables 13

References 24

1	MAKOVSKI T， VÁZQUEZ G A， JIANG Y V. Visual learning in multiple-object tracking［J］. PLoS ONE， 2008， 3（5）： No.e2228. 10.1371/journal.pone.0002228
2	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（3）： 583-596. 10.1109/tpami.2014.2345390
3	DANELLJAN M， KHAN F S， FELSBERG M， et al. Adaptive color attributes for real-time visual tracking［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1090-1097. 10.1109/cvpr.2014.143
4	DANELLJAN M， ROBINSON A， KHAN F S， et al. Beyond correlation filters： learning continuous convolution operators for visual tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9909. Cham： Springer， 2016： 472-488. 10.3384/diss.diva-147543
5	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9914. Cham： Springer， 2016： 850-865.
6	LI B， YAN J J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
7	WANG Q， ZHANG L， BERTINETTO L， et al. Fast online object tracking and segmentation： a unifying approach［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1328-1338. 10.1109/cvpr.2019.00142
8	GU X Q， CHANG H， MA B P， et al. Appearance-preserving 3D convolution for video-based person re-identification［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS12347. Cham： Springer， 2020： 228-243.
9	LAMPLE G， SABLAYROLLES A， RANZATO M， et al. Large memory layers with product keys［EB/OL］. （2019-12-16）［2021-03-20］..
10	ZHU Z， WANG Q， LI B， et al. Distractor-aware Siamese networks for visual object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11213. Cham： Springer， 2018： 103-119. 10.1007/978-3-030-01240-3_7
11	LI B， WU W， WANG Q， et al. SiamRPN++： evolution of Siamese visual tracking with very deep networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4277-4286. 10.1109/cvpr.2019.00441
12	DANELLJAN M， BHAT G， KHAN F S， et al. ATOM： accurate tracking by overlap maximization［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4655-4664. 10.1109/cvpr.2019.00479
13	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
14	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS8693. Cham： Springer， 2014： 740-755.
15	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
16	XU N， YANG L J， FAN Y C， et al. YouTube-VOS： sequence-to-sequence video object segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11209. Cham： Springer， 2018： 603-619.
17	KRISTAN M， LEONARDIS A， MATAS J， et al. The Visual Object Tracking VOT2016 challenge results［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9914. Cham： Springer， 2016： 777-823.
18	KRISTAN M， LEONARDIS A， MATAS J， et al. The sixth Visual Object Tracking VOT2018 challenge results［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11129. Cham： Springer， 2018： 3-53.
19	PERAZZI F， PONT-TUSET J， McWILLIAMS B， et al. A benchmark dataset and evaluation methodology for video object segmentation［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 724-732. 10.1109/cvpr.2016.85
20	PONT-TUSET J， PERAZZI F， CAELLES S， et al. The 2017 DAVIS Challenge on Video Object Segmentation［EB/OL］. （2018-03-01）［2021-03-20］..
21	PERAZZI F， KHOREVA A， BENENSON R， et al. Learning video object segmentation from static images［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3491-3500. 10.1109/cvpr.2017.372
22	CAELLES S， MANINIS K K， PONT-TUSET J， et al. One-shot video object segmentation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5320-5329. 10.1109/cvpr.2017.565
23	CHENG J C， TSAI Y H， WANG S J， et al. SegFlow： joint learning for video object segmentation and optical flow［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 686-695. 10.1109/iccv.2017.81
24	VOIGTLAENDER P， LEIBE B. Online adaptation of convolutional neural networks for video object segmentation［C］// Proceedings of the 2017 British Machine Vision Conference. Durham： BMVA Press， 2017： No.116. 10.5244/c.31.116

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN	0.61	0.22	0.411
SiamRPN	0.56	0.26	0.344
ATOM	—	—	—
SiamRPN++	0.64	0.20	0.464
SiamMask-box	0.618	0.210	0.419
SiamMask-MBR	0.621	0.210	0.421
SiamAsbm-box	0.631	0.218	0.425
SiamAsbm-MBR	0.661	0.214	0.434

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN	0.61	0.22	0.411
SiamRPN	0.56	0.26	0.344
ATOM	—	—	—
SiamRPN++	0.64	0.20	0.464
SiamMask-box	0.618	0.210	0.419
SiamMask-MBR	0.621	0.210	0.421
SiamAsbm-box	0.631	0.218	0.425
SiamAsbm-MBR	0.661	0.214	0.434

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN^［8］	0.569	0.337	0.326
SiamRPN^［6］	0.490	0.460	0.244
ATOM^［10］	0.590	0.204	0.401
SiamRPN++^［9］	0.600	0.234	0.414
SiamMask-box	0.589	0.300	0.360
SiamMask-MBR^［7］	0.592	0.286	0.359
SiamAsbm-box	0.592	0.295	0.364
SiamAsbm-MBR	0.629	0.258	0.370

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN^［8］	0.569	0.337	0.326
SiamRPN^［6］	0.490	0.460	0.244
ATOM^［10］	0.590	0.204	0.401
SiamRPN++^［9］	0.600	0.234	0.414
SiamMask-box	0.589	0.300	0.360
SiamMask-MBR^［7］	0.592	0.286	0.359
SiamAsbm-box	0.592	0.295	0.364
SiamAsbm-MBR	0.629	0.258	0.370

Baseline基础上增加的模块			准确率	稳健性	预期平均重叠率
特征叠加	特征对齐	特征增强	准确率	稳健性	预期平均重叠率
			0.592	0.286	0.359
√			0.589	0.300	0.354
√	√		0.579	0.272	0.360
√		√	0.610	0.290	0.355
√	√	√	0.629	0.258	0.370