基于时空上下文信息增强的目标跟踪算法

doi:10.11772/j.issn.1001-9081.2021061034

《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3565-3570.DOI: 10.11772/j.issn.1001-9081.2021061034

• 第十八届中国机器学习会议(CCML 2021) • 上一篇

基于时空上下文信息增强的目标跟踪算法

温静(), 李强

山西大学计算机与信息技术学院，太原 030006

收稿日期:2021-05-12 修回日期:2021-07-18 接受日期:2021-07-22 发布日期:2021-12-28 出版日期:2021-12-10
通讯作者: 温静
作者简介:李强（1995—），男，山西大同人，硕士研究生，主要研究方向：计算机视觉、图像处理。
基金资助:
山西省研究生教育改革研究课题(2020YJJG030)

Object tracking algorithm based on spatio-temporal context information enhancement

Jing WEN(), Qiang LI

School of Computer and Information Technology，Shanxi University，Taiyuan Shanxi 030006，China

Received:2021-05-12 Revised:2021-07-18 Accepted:2021-07-22 Online:2021-12-28 Published:2021-12-10
Contact: Jing WEN
About author:LI Qiang， born in 1995， M. S. candidate. His research interests include computer vision， image processing.
Supported by:
Research Project of Postgraduate Education Reform in Shanxi Province(2020YJJG030)

摘要/Abstract

摘要：

充分利用视频中的时空上下文信息能明显提高目标跟踪性能，但目前大多数基于深度学习的目标跟踪算法仅利用当前帧的特征信息来定位目标，没有利用同一目标在视频前后帧的时空上下文特征信息，导致跟踪目标易受到邻近相似目标的干扰，从而在跟踪定位时会引入一个潜在的累计误差。为了保留时空上下文信息，在SiamMask算法的基础上引入一个短期记忆存储池来存储历史帧特征；同时，提出了外观显著性增强模块（ASBM），一方面增强跟踪目标的显著性特征，另一方面抑制周围相似目标对目标的干扰。基于此，提出一种基于时空上下文信息增强的目标跟踪算法。在VOT2016、VOT2018、DAVIS-2016和DAVIS-2017等四个数据集上进行实验与分析，结果表明所提出的算法相较于SiamMask算法在VOT2016上的准确率和平均重叠率（EAO）分别提升了4个百分点和2个百分点；在VOT2018上的准确率、鲁棒性和EAO分别提升了3.7个百分点、2.8个百分点和1个百分点；在DAVIS-2016上的区域相似度、轮廓精度指标中的下降率均分别降低了0.2个百分点；在DAVIS-2017上的区域相似度、轮廓精度指标中的下降率分别降低了1.3和0.9个百分点。

关键词: 目标跟踪, 上下文信息, 显著特征, 特征增强, 深度学习

Abstract:

Making full use of the spatio-temporal context information in the video can significantly improve the performance of object tracking， but most of the current object tracking algorithms based on deep learning only use the feature information of the current frame to locate the object， without using the spatio-temporal context information of the same object in the video frames before and after the current frame， which leads to the tracking object being susceptible to the interference from the similar object nearby， so a potential cumulative error will be introduced during tracking and locating. In order to retain spatio-temporal context information， a short-term memory storage pool was introduced based on SiamMask algorithm to store features of the historical frames； meanwhile， an Appearance Saliency Boosting Module （ASBM） was proposed， which not only enhanced the saliency features of the tracking object， but also suppressed the interference from similar object around the tracking object. On the basis of the above， an object tracking algorithm based on spatio-temporal context information enhancement was proposed. To verify the performance of the proposed algorithm， experiments were carried out on four datasets， including VOT2016， VOT2018， DAVIS-2016 and DAVIS-2017. Experimental results show that compared with SiamMask algorithm， the proposed algorithm has the accuracy and Expected Average Overlap rate （EAO） increased by 4 percentage points and 2 percentage points respectively on VOT2016 dataset， and has the accuracy， robustness and EAO improved by 3.7 percentage points， 2.8 percentage points and 1 percentage point respectively on VOT2018 dataset， and has the decay of the regional similarity and contour accuracy indicators on DAVIS-2016 datasets both reduced by 0.2 percentage points， and has the decay of the regional similarity and contour progress indicators on DAVIS-2017 datasets reduced by 1.3 and 0.9 percentage points respectively.

Key words: object tracking, context information, salient feature, feature enhancement, deep learning

中图分类号:

TP391.413

温静, 李强. 基于时空上下文信息增强的目标跟踪算法[J]. 计算机应用, 2021, 41(12): 3565-3570.

Jing WEN, Qiang LI. Object tracking algorithm based on spatio-temporal context information enhancement[J]. Journal of Computer Applications, 2021, 41(12): 3565-3570.

图/表 13

参考文献 24

1	MAKOVSKI T， VÁZQUEZ G A， JIANG Y V. Visual learning in multiple-object tracking［J］. PLoS ONE， 2008， 3（5）： No.e2228. 10.1371/journal.pone.0002228
2	HENRIQUES J F， CASEIRO R， MARTINS P， et al. High-speed tracking with kernelized correlation filters［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（3）： 583-596. 10.1109/tpami.2014.2345390
3	DANELLJAN M， KHAN F S， FELSBERG M， et al. Adaptive color attributes for real-time visual tracking［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 1090-1097. 10.1109/cvpr.2014.143
4	DANELLJAN M， ROBINSON A， KHAN F S， et al. Beyond correlation filters： learning continuous convolution operators for visual tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9909. Cham： Springer， 2016： 472-488. 10.3384/diss.diva-147543
5	BERTINETTO L， VALMADRE J， HENRIQUES J F， et al. Fully-convolutional Siamese networks for object tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9914. Cham： Springer， 2016： 850-865.
6	LI B， YAN J J， WU W， et al. High performance visual tracking with Siamese region proposal network［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8971-8980. 10.1109/cvpr.2018.00935
7	WANG Q， ZHANG L， BERTINETTO L， et al. Fast online object tracking and segmentation： a unifying approach［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 1328-1338. 10.1109/cvpr.2019.00142
8	GU X Q， CHANG H， MA B P， et al. Appearance-preserving 3D convolution for video-based person re-identification［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS12347. Cham： Springer， 2020： 228-243.
9	LAMPLE G， SABLAYROLLES A， RANZATO M， et al. Large memory layers with product keys［EB/OL］. （2019-12-16）［2021-03-20］..
10	ZHU Z， WANG Q， LI B， et al. Distractor-aware Siamese networks for visual object tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11213. Cham： Springer， 2018： 103-119. 10.1007/978-3-030-01240-3_7
11	LI B， WU W， WANG Q， et al. SiamRPN++： evolution of Siamese visual tracking with very deep networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4277-4286. 10.1109/cvpr.2019.00441
12	DANELLJAN M， BHAT G， KHAN F S， et al. ATOM： accurate tracking by overlap maximization［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4655-4664. 10.1109/cvpr.2019.00479
13	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
14	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS8693. Cham： Springer， 2014： 740-755.
15	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
16	XU N， YANG L J， FAN Y C， et al. YouTube-VOS： sequence-to-sequence video object segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11209. Cham： Springer， 2018： 603-619.
17	KRISTAN M， LEONARDIS A， MATAS J， et al. The Visual Object Tracking VOT2016 challenge results［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS9914. Cham： Springer， 2016： 777-823.
18	KRISTAN M， LEONARDIS A， MATAS J， et al. The sixth Visual Object Tracking VOT2018 challenge results［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11129. Cham： Springer， 2018： 3-53.
19	PERAZZI F， PONT-TUSET J， McWILLIAMS B， et al. A benchmark dataset and evaluation methodology for video object segmentation［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 724-732. 10.1109/cvpr.2016.85
20	PONT-TUSET J， PERAZZI F， CAELLES S， et al. The 2017 DAVIS Challenge on Video Object Segmentation［EB/OL］. （2018-03-01）［2021-03-20］..
21	PERAZZI F， KHOREVA A， BENENSON R， et al. Learning video object segmentation from static images［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 3491-3500. 10.1109/cvpr.2017.372
22	CAELLES S， MANINIS K K， PONT-TUSET J， et al. One-shot video object segmentation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5320-5329. 10.1109/cvpr.2017.565
23	CHENG J C， TSAI Y H， WANG S J， et al. SegFlow： joint learning for video object segmentation and optical flow［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 686-695. 10.1109/iccv.2017.81
24	VOIGTLAENDER P， LEIBE B. Online adaptation of convolutional neural networks for video object segmentation［C］// Proceedings of the 2017 British Machine Vision Conference. Durham： BMVA Press， 2017： No.116. 10.5244/c.31.116

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN	0.61	0.22	0.411
SiamRPN	0.56	0.26	0.344
ATOM	—	—	—
SiamRPN++	0.64	0.20	0.464
SiamMask-box	0.618	0.210	0.419
SiamMask-MBR	0.621	0.210	0.421
SiamAsbm-box	0.631	0.218	0.425
SiamAsbm-MBR	0.661	0.214	0.434

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN	0.61	0.22	0.411
SiamRPN	0.56	0.26	0.344
ATOM	—	—	—
SiamRPN++	0.64	0.20	0.464
SiamMask-box	0.618	0.210	0.419
SiamMask-MBR	0.621	0.210	0.421
SiamAsbm-box	0.631	0.218	0.425
SiamAsbm-MBR	0.661	0.214	0.434

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN^［8］	0.569	0.337	0.326
SiamRPN^［6］	0.490	0.460	0.244
ATOM^［10］	0.590	0.204	0.401
SiamRPN++^［9］	0.600	0.234	0.414
SiamMask-box	0.589	0.300	0.360
SiamMask-MBR^［7］	0.592	0.286	0.359
SiamAsbm-box	0.592	0.295	0.364
SiamAsbm-MBR	0.629	0.258	0.370

算法	准确率	稳健性	预期平均重叠率
DaSiamRPN^［8］	0.569	0.337	0.326
SiamRPN^［6］	0.490	0.460	0.244
ATOM^［10］	0.590	0.204	0.401
SiamRPN++^［9］	0.600	0.234	0.414
SiamMask-box	0.589	0.300	0.360
SiamMask-MBR^［7］	0.592	0.286	0.359
SiamAsbm-box	0.592	0.295	0.364
SiamAsbm-MBR	0.629	0.258	0.370

Baseline基础上增加的模块			准确率	稳健性	预期平均重叠率
特征叠加	特征对齐	特征增强	准确率	稳健性	预期平均重叠率
			0.592	0.286	0.359
√			0.589	0.300	0.354
√	√		0.579	0.272	0.360
√		√	0.610	0.290	0.355
√	√	√	0.629	0.258	0.370

基于时空上下文信息增强的目标跟踪算法

Object tracking algorithm based on spatio-temporal context information enhancement

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 24

相关文章 15

编辑推荐

Metrics

算法	区域相似度			轮廓精度			时间稳定性
算法	JM	JO	JD	FM	FO	FD	TM
Msk^［21］	0.792	0.924	0.094	0.749	0.864	0.093	0.222
Osvos^［22］	0.797	0.933	0.151	0.806	0.922	0.155	0.348
SegFlow^［23］	0.761	0.906	0.121	0.760	0.855	0.104	0.194
SiamMask^［7］	0.712	0.862	0.051	0.663	0.759	0.073	0.279
本文算法	0.714	0.854	0.049	0.666	0.751	0.071	0.279

算法	区域相似度			轮廓精度			时间稳定性
算法	JM	JO	JD	FM	FO	FD	TM
OnAVOS^［24］	0.616	0.674	0.279	0.691	0.754	0.266	0.431
Osvos^［22］	0.566	0.636	0.261	0.639	0.736	0.270	0.529
SiamMask^［7］	0.534	0.628	0.193	0.585	0.675	0.209	0.451
本文算法	0.609	0.704	0.180	0.611	0.665	0.200	0.430

[1]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[2]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[3]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[4]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[5]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[6]	姬张建, 任兴旺. 带旋转与尺度估计的全卷积孪生网络目标跟踪算法[J]. 计算机应用, 2021, 41(9): 2705-2711.
[7]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[8]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(8): 2273-2287.
[9]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[10]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[11]	李亚芳, 梁烨, 冯韦玮, 祖宝开, 康玉健. 基于社区优化的深度网络嵌入方法[J]. 计算机应用, 2021, 41(7): 1956-1963.
[12]	侯笑晗, 金国栋, 谭力宁, 薛远亮. 基于自适应和最优特征的合成孔径雷达舰船检测方法[J]. 计算机应用, 2021, 41(7): 2150-2155.
[13]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[14]	王月, 江逸茗, 兰巨龙. 基于改进三元组网络和K近邻算法的入侵检测[J]. 计算机应用, 2021, 41(7): 1996-2002.
[15]	杜炎, 吕良福, 焦一辰. 基于模糊推理的模糊原型网络[J]. 计算机应用, 2021, 41(7): 1885-1890.