Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Wenqiu ZHU1,2(), Guang ZOU1,2, Zhigao ZENG1,2
Received:
2021-03-22
Revised:
2021-06-15
Accepted:
2021-06-17
Online:
2022-04-09
Published:
2022-03-10
Contact:
Wenqiu ZHU
About author:
ZOU Guang, born in 1997, M. S. candidate. His research interests include digital image processing, objection tracking.Supported by:
通讯作者:
朱文球
作者简介:
邹广(1997—),男,湖南岳阳人,硕士研究生,主要研究方向:数字图像处理、目标跟踪基金资助:
CLC Number:
Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.
朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 833-843.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030432
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
Tab.1 Network structure and corresponding operation of each block
网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
---|---|---|---|
— | — | 127×127×3 | 255×55×3 |
Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
Block2 | 15×15×256 | 31×31×256 | |
Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
Tab. 2 Algorithm evaluation results on LaSOT dataset
算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
---|---|---|---|---|---|
DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
BACF[ | 26.3 | 28.3 |
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
Tab. 3 Evaluation results on VOT2018 dataset
算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
---|---|---|---|---|
DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
ECO | 48.4 | 32.9 | 26.0 | 77.6 |
CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
Tab. 4 Challenge attributes included in each test sequence
序列名称 | 选取的帧数 | 挑战属性 |
---|---|---|
Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
David3 | 62、85、188 | OCC、DEF、OPR、BC |
Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
Walking2 | 197、219、241 | SV、OCC、LR |
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset
网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
---|---|---|---|---|---|
SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
Block2+Block3 | 79.44 | 60.32 |
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset
混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
---|---|---|---|---|
SiamFC | 50.1 | 58.8 | 18.6 | — |
Base | 51.3 | 47.7 | 21.7 | +3.1 |
Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
1 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]// Proceedings of the 2016 European Conference on Computer Vision. Cham: Springer, 2016: 850-865. 10.1007/978-3-319-48881-3_56 |
2 | VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 2805-2813. 10.1109/cvpr.2017.531 |
3 | ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4591-4600. 10.1109/cvpr.2019.00472 |
4 | LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
5 | WANG Q, TENG Z, XING J L, et al. Learning attentions: residual attentional Siamese network for high performance online visual tracking[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4854-4863. 10.1109/cvpr.2018.00510 |
6 | PARK J, WOO S, LEE J Y, et al. Bam: Bottleneck attention module[EB/OL].[2020-10-10].. 10.1007/s11263-019-01283-0 |
7 | 费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305. |
FEI D S, SONG H H, ZHANG K H. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2020, 40(11): 3300-3305. | |
8 | 李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 |
LI S W, ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking[J]. Journal of Computer Applications, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 | |
9 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
10 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2020-10-10]. . 10.4236/psych.2020.1110096 |
11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL].[2020-10-10].. 10.1016/s0262-4079(17)32358-8 |
12 | CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 5659-5667. 10.1109/cvpr.2017.667 |
13 | WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11531-11539. 10.1109/cvpr42600.2020.01155 |
14 | HUANG L, ZHAO X, HUANG K. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,43(5): 1562-1577. |
15 | WU Y, LIM J, YANG M H. Online object tracking: a benchmark[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2013: 2411-2418. 10.1109/cvpr.2013.312 |
16 | WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis Machine Intelligence,2015,37(9):1834-1848. 10.1109/tpami.2014.2388226 |
17 | KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]// Proceedings of the 2018 the European Conference on Computer Vision. Cham: Springer, 2018: 3-53. |
18 | FAN H, LIN L, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5374-5383. 10.1109/cvpr.2019.00552 |
19 | LUKEZIC A, VOJIR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6309-6318. 10.1109/cvpr.2017.515 |
20 | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4293-4302. 10.1109/cvpr.2016.465 |
21 | GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017: 1763-1771. 10.1109/iccv.2017.196 |
22 | DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6638-6646. 10.1109/cvpr.2017.733 |
23 | GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1135-1143. 10.1109/iccv.2017.129 |
24 | DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE,2015: 4310-4318. 10.1109/iccv.2015.490 |
25 | BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1401-1409. 10.1109/cvpr.2016.156 |
26 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. 10.1109/tpami.2014.2345390 |
27 | DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE,2015: 58-66. 10.1109/iccvw.2015.84 |
[1] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[4] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[5] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[6] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[7] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[8] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[9] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[10] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[11] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
[12] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[14] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[15] | Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||