《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432
所属专题: 人工智能
收稿日期:
2021-03-22
修回日期:
2021-06-15
接受日期:
2021-06-17
发布日期:
2022-04-09
出版日期:
2022-03-10
通讯作者:
朱文球
作者简介:
邹广(1997—),男,湖南岳阳人,硕士研究生,主要研究方向:数字图像处理、目标跟踪基金资助:
Wenqiu ZHU1,2(), Guang ZOU1,2, Zhigao ZENG1,2
Received:
2021-03-22
Revised:
2021-06-15
Accepted:
2021-06-17
Online:
2022-04-09
Published:
2022-03-10
Contact:
Wenqiu ZHU
About author:
ZOU Guang, born in 1997, M. S. candidate. His research interests include digital image processing, objection tracking.Supported by:
摘要:
目标跟踪任务中,全卷积孪生网络的目标跟踪(SiamFC)算法在目标遮挡、光照变化等场景时会表现出鲁棒性较差、丢失跟踪目标等问题,为此提出一种结合特征融合和注意力机制的目标跟踪算法。首先,采用ResNet50作为主干网络提取更充分的目标特征;其次,结合注意力机制对特征进行筛选,将筛选后的低层模板特征与高层模板特征分别同对应搜索特征做互相关操作后进行自适应加权融合,提升网络对正负样本的辨别力。在OTB100数据集上测试,所提算法的精度和成功率分别为81.25%和64.06%;在LaSOT数据集上测试,该算法的精度和成功率分别为49.4%和50.1%。实验结果表明,该算法目标跟踪性能优于全卷积孪生网络算法,且在处理复杂场景时有更好的鲁棒性。
中图分类号:
朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 计算机应用, 2022, 42(3): 833-843.
Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
表1 网络结构及各网络块执行的操作
Tab.1 Network structure and corresponding operation of each block
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
表2 LaSOT数据集上算法评测结果 ( %)
Tab. 2 Algorithm evaluation results on LaSOT dataset
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
表3 在VOT2018数据集上的评估结果
Tab. 3 Evaluation results on VOT2018 dataset
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
表4 各测试序列包含的挑战属性
Tab. 4 Challenge attributes included in each test sequence
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
表5 不同网络块组合在OTB100数据集上的实验结果对比 (%)
Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
表6 不同混合注意力机制在VOT2018数据集上的实验结果对比
Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
1 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]// Proceedings of the 2016 European Conference on Computer Vision. Cham: Springer, 2016: 850-865. 10.1007/978-3-319-48881-3_56 |
2 | VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 2805-2813. 10.1109/cvpr.2017.531 |
3 | ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4591-4600. 10.1109/cvpr.2019.00472 |
4 | LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
5 | WANG Q, TENG Z, XING J L, et al. Learning attentions: residual attentional Siamese network for high performance online visual tracking[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4854-4863. 10.1109/cvpr.2018.00510 |
6 | PARK J, WOO S, LEE J Y, et al. Bam: Bottleneck attention module[EB/OL].[2020-10-10].. 10.1007/s11263-019-01283-0 |
7 | 费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305. |
FEI D S, SONG H H, ZHANG K H. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2020, 40(11): 3300-3305. | |
8 | 李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 |
LI S W, ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking[J]. Journal of Computer Applications, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 | |
9 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
10 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2020-10-10]. . 10.4236/psych.2020.1110096 |
11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL].[2020-10-10].. 10.1016/s0262-4079(17)32358-8 |
12 | CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 5659-5667. 10.1109/cvpr.2017.667 |
13 | WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11531-11539. 10.1109/cvpr42600.2020.01155 |
14 | HUANG L, ZHAO X, HUANG K. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,43(5): 1562-1577. |
15 | WU Y, LIM J, YANG M H. Online object tracking: a benchmark[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2013: 2411-2418. 10.1109/cvpr.2013.312 |
16 | WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis Machine Intelligence,2015,37(9):1834-1848. 10.1109/tpami.2014.2388226 |
17 | KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]// Proceedings of the 2018 the European Conference on Computer Vision. Cham: Springer, 2018: 3-53. |
18 | FAN H, LIN L, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5374-5383. 10.1109/cvpr.2019.00552 |
19 | LUKEZIC A, VOJIR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6309-6318. 10.1109/cvpr.2017.515 |
20 | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4293-4302. 10.1109/cvpr.2016.465 |
21 | GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017: 1763-1771. 10.1109/iccv.2017.196 |
22 | DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6638-6646. 10.1109/cvpr.2017.733 |
23 | GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1135-1143. 10.1109/iccv.2017.129 |
24 | DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE,2015: 4310-4318. 10.1109/iccv.2015.490 |
25 | BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1401-1409. 10.1109/cvpr.2016.156 |
26 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. 10.1109/tpami.2014.2345390 |
27 | DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE,2015: 58-66. 10.1109/iccvw.2015.84 |
[1] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[2] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[3] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[4] | 姜文涛, 李宛宣, 张晟翀. 非线性时间一致性的相关滤波目标跟踪[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2558-2570. |
[5] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[6] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[7] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[8] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[9] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[10] | 刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109. |
[11] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
[12] | 李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182. |
[13] | 魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191. |
[14] | 熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232. |
[15] | 李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||