《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432
• 人工智能 • 上一篇
收稿日期:
2021-03-22
修回日期:
2021-06-15
接受日期:
2021-06-17
发布日期:
2022-04-09
出版日期:
2022-03-10
通讯作者:
朱文球
作者简介:
邹广(1997—),男,湖南岳阳人,硕士研究生,主要研究方向:数字图像处理、目标跟踪基金资助:
Wenqiu ZHU1,2(), Guang ZOU1,2, Zhigao ZENG1,2
Received:
2021-03-22
Revised:
2021-06-15
Accepted:
2021-06-17
Online:
2022-04-09
Published:
2022-03-10
Contact:
Wenqiu ZHU
About author:
ZOU Guang, born in 1997, M. S. candidate. His research interests include digital image processing, objection tracking.Supported by:
摘要:
目标跟踪任务中,全卷积孪生网络的目标跟踪(SiamFC)算法在目标遮挡、光照变化等场景时会表现出鲁棒性较差、丢失跟踪目标等问题,为此提出一种结合特征融合和注意力机制的目标跟踪算法。首先,采用ResNet50作为主干网络提取更充分的目标特征;其次,结合注意力机制对特征进行筛选,将筛选后的低层模板特征与高层模板特征分别同对应搜索特征做互相关操作后进行自适应加权融合,提升网络对正负样本的辨别力。在OTB100数据集上测试,所提算法的精度和成功率分别为81.25%和64.06%;在LaSOT数据集上测试,该算法的精度和成功率分别为49.4%和50.1%。实验结果表明,该算法目标跟踪性能优于全卷积孪生网络算法,且在处理复杂场景时有更好的鲁棒性。
中图分类号:
朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 计算机应用, 2022, 42(3): 833-843.
Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
表1 网络结构及各网络块执行的操作
Tab.1 Network structure and corresponding operation of each block
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
表2 LaSOT数据集上算法评测结果 ( %)
Tab. 2 Algorithm evaluation results on LaSOT dataset
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
表3 在VOT2018数据集上的评估结果
Tab. 3 Evaluation results on VOT2018 dataset
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
表4 各测试序列包含的挑战属性
Tab. 4 Challenge attributes included in each test sequence
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
表5 不同网络块组合在OTB100数据集上的实验结果对比 (%)
Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
表6 不同混合注意力机制在VOT2018数据集上的实验结果对比
Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
1 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]// Proceedings of the 2016 European Conference on Computer Vision. Cham: Springer, 2016: 850-865. 10.1007/978-3-319-48881-3_56 |
2 | VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 2805-2813. 10.1109/cvpr.2017.531 |
3 | ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4591-4600. 10.1109/cvpr.2019.00472 |
4 | LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
5 | WANG Q, TENG Z, XING J L, et al. Learning attentions: residual attentional Siamese network for high performance online visual tracking[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4854-4863. 10.1109/cvpr.2018.00510 |
6 | PARK J, WOO S, LEE J Y, et al. Bam: Bottleneck attention module[EB/OL].[2020-10-10].. 10.1007/s11263-019-01283-0 |
7 | 费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305. |
FEI D S, SONG H H, ZHANG K H. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2020, 40(11): 3300-3305. | |
8 | 李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 |
LI S W, ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking[J]. Journal of Computer Applications, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 | |
9 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
10 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2020-10-10]. . 10.4236/psych.2020.1110096 |
11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL].[2020-10-10].. 10.1016/s0262-4079(17)32358-8 |
12 | CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 5659-5667. 10.1109/cvpr.2017.667 |
13 | WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11531-11539. 10.1109/cvpr42600.2020.01155 |
14 | HUANG L, ZHAO X, HUANG K. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,43(5): 1562-1577. |
15 | WU Y, LIM J, YANG M H. Online object tracking: a benchmark[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2013: 2411-2418. 10.1109/cvpr.2013.312 |
16 | WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis Machine Intelligence,2015,37(9):1834-1848. 10.1109/tpami.2014.2388226 |
17 | KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]// Proceedings of the 2018 the European Conference on Computer Vision. Cham: Springer, 2018: 3-53. |
18 | FAN H, LIN L, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5374-5383. 10.1109/cvpr.2019.00552 |
19 | LUKEZIC A, VOJIR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6309-6318. 10.1109/cvpr.2017.515 |
20 | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4293-4302. 10.1109/cvpr.2016.465 |
21 | GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017: 1763-1771. 10.1109/iccv.2017.196 |
22 | DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6638-6646. 10.1109/cvpr.2017.733 |
23 | GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1135-1143. 10.1109/iccv.2017.129 |
24 | DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE,2015: 4310-4318. 10.1109/iccv.2015.490 |
25 | BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1401-1409. 10.1109/cvpr.2016.156 |
26 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. 10.1109/tpami.2014.2345390 |
27 | DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE,2015: 58-66. 10.1109/iccvw.2015.84 |
[1] | 罗圣钦, 陈金怡, 李洪均. 基于注意力机制的多尺度残差UNet实现乳腺癌灶分割[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 818-824. |
[2] | 黄勇康, 梁美玉, 王笑笑, 陈徵, 曹晓雯. 基于深度时空残差卷积神经网络的课堂教学视频中多人课堂行为识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 736-742. |
[3] | 李亚鸣, 邢凯, 邓洪武, 王志勇, 胡璇. 基于小样本无梯度学习的卷积结构预训练模型性能优化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 365-374. |
[4] | 张毅, 王爽胜, 何彬, 叶培明, 李克强. 基于BERT的初等数学文本命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 433-439. |
[5] | 刘羽茜, 刘玉奇, 张宗霖, 卫志华, 苗冉. 注入注意力机制的深度特征融合新闻推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 426-432. |
[6] | 孟杰, 王莉, 杨延杰, 廉飚. 基于多模态深度融合的虚假信息检测[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 419-425. |
[7] | 潘仁志, 钱付兰, 赵姝, 张燕平. 基于卷积神经网络交互的用户属性偏好建模的推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 404-411. |
[8] | 杨贞, 彭小宝, 朱强强, 殷志坚. 基于Deeplab V3 Plus的自适应注意力机制图像分割算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 230-238. |
[9] | 吕学强, 彭郴, 张乐, 董志安, 游新冬. 融合BERT与标签语义注意力的文本多标签分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 57-63. |
[10] | 王润泽, 张月琴, 秦琪琦, 张泽华, 郭旭敏. 多视角多注意力融合分子特征的药物-靶标亲和力预测[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 325-332. |
[11] | 代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551. |
[12] | 刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522. |
[13] | 李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509. |
[14] | 赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503. |
[15] | 姬张建, 任兴旺. 带旋转与尺度估计的全卷积孪生网络目标跟踪算法[J]. 计算机应用, 2021, 41(9): 2705-2711. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||