Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 833-843.DOI: 10.11772/j.issn.1001-9081.2021030432
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Wenqiu ZHU1,2(
), Guang ZOU1,2, Zhigao ZENG1,2
Received:2021-03-22
Revised:2021-06-15
Accepted:2021-06-17
Online:2022-04-09
Published:2022-03-10
Contact:
Wenqiu ZHU
About author:ZOU Guang, born in 1997, M. S. candidate. His research interests include digital image processing, objection tracking.Supported by:通讯作者:
朱文球
作者简介:邹广(1997—),男,湖南岳阳人,硕士研究生,主要研究方向:数字图像处理、目标跟踪基金资助:CLC Number:
Wenqiu ZHU, Guang ZOU, Zhigao ZENG. Object tracking algorithm with hierarchical features and hybrid attention[J]. Journal of Computer Applications, 2022, 42(3): 833-843.
朱文球, 邹广, 曾志高. 融合层次特征和混合注意力的目标跟踪算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 833-843.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030432
| 网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
|---|---|---|---|
| — | — | 127×127×3 | 255×55×3 |
| Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
| Block2 | 15×15×256 | 31×31×256 | |
| Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
| Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
Tab.1 Network structure and corresponding operation of each block
| 网络块名称 | 执行操作 | 模板图像大小 | 搜索图像大小 |
|---|---|---|---|
| — | — | 127×127×3 | 255×55×3 |
| Block1 | 7×7,64,3×3maxp,s=2 | 31×31×64 | 62×62×64 |
| Block2 | 15×15×256 | 31×31×256 | |
| Hybrid-Attn | — | 15×15×256 | 31×31×256 |
Block3+ Dilation | 15×15×512 | 31×31×512 | |
Block4+ Dilation | 15×15×1 024 | 31×31×1 024 | |
| Hybrid-Attn | — | 15×15×1 024 | 31×31×1 024 |
Block5+ Dilation | 15×15×2 048 | 31×31×2 048 |
| 算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
|---|---|---|---|---|---|
| DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
| SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
| SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
| Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
| ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
| BACF[ | 26.3 | 28.3 |
Tab. 2 Algorithm evaluation results on LaSOT dataset
| 算法 | 成功率 | 精度 | 算法 | 成功率 | 精度 |
|---|---|---|---|---|---|
| DeepSiamFC-Attn | 50.1 | 49.4 | CFNet | 25.8 | 31.2 |
| SiamDW | 39.7 | 43.7 | SRDCF[ | 24.5 | 24.8 |
| SiamFC | 38.2 | 42.0 | Staple[ | 24.0 | 27.8 |
| Dsiam[ | 36.2 | 40.5 | CSR-DCF | 22.4 | 25.4 |
| ECO[ | 32.9 | 33.8 | KCF[ | 15.6 | 19.0 |
| BACF[ | 26.3 | 28.3 |
| 算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
|---|---|---|---|---|
| DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
| DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
| MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
| DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
| SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
| CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
| ECO | 48.4 | 32.9 | 26.0 | 77.6 |
| CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
Tab. 3 Evaluation results on VOT2018 dataset
| 算法 | 准确率/% | 鲁棒性/% | 平均 重叠率/% | 平均速度/(frame·s-1) |
|---|---|---|---|---|
| DeepSiamFC-Attn | 57.3 | 30.7 | 28.4 | 52.2 |
| DeepSRDCF[ | 49.8 | 39.3 | 25.3 | 41.6 |
| MDNet | 54.5 | 38.6 | 25.7 | 4.6 |
| DSiam | 51.2 | 64.6 | 24.4 | 44.1 |
| SiamFC | 50.1 | 58.8 | 18.6 | 54.5 |
| CSR-DCF | 44.5 | 66.3 | 24.1 | 13.6 |
| ECO | 48.4 | 32.9 | 26.0 | 77.6 |
| CFNet | 43.7 | 59.4 | 17.8 | 30.5 |
| 序列名称 | 选取的帧数 | 挑战属性 |
|---|---|---|
| Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
| David3 | 62、85、188 | OCC、DEF、OPR、BC |
| Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
| Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
| Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
| Walking2 | 197、219、241 | SV、OCC、LR |
Tab. 4 Challenge attributes included in each test sequence
| 序列名称 | 选取的帧数 | 挑战属性 |
|---|---|---|
| Bolt | 24、60、124 | OCC、DEF、IPR、OPR |
| David3 | 62、85、188 | OCC、DEF、OPR、BC |
| Matrix | 44、53、75 | IV、SV、OCC、FM、IPR、BC |
| Singer2 | 45、210、316 | IV、DEF、IPR、OPR、BC |
| Skating1 | 184、197、318 | SV、OCC、DEF、OPR、BC |
| Walking2 | 197、219、241 | SV、OCC、LR |
| 网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
|---|---|---|---|---|---|
| SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
| Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
| Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
| Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
| Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
| Block2+Block3 | 79.44 | 60.32 |
Tab. 5 Experimental results comparison of different network block combination on OTB100 dataset
| 网络块组合 | 精度 | 成功率 | 网络块组合 | 精度 | 成功率 |
|---|---|---|---|---|---|
| SiamFC | 77.10 | 58.32 | Block2+Block4 | 81.25 | 63.43 |
| Block1+Block2 | 76.41 | 57.44 | Block2+Block5 | 80.43 | 61.45 |
| Block1+Block3 | 78.27 | 58.52 | Block3+Block4 | 79.35 | 61.55 |
| Block1+Block4 | 78.39 | 58.33 | Block3+Block5 | 78.21 | 61.23 |
| Block1+Block5 | 79.13 | 59.18 | Block4+Block5 | 76.33 | 58.43 |
| Block2+Block3 | 79.44 | 60.32 |
| 混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
|---|---|---|---|---|
| SiamFC | 50.1 | 58.8 | 18.6 | — |
| Base | 51.3 | 47.7 | 21.7 | +3.1 |
| Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
| Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
| Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
Tab. 6 Experimental results comparison of various hybrid attention mechanism on VOT2018 dataset
| 混合注意力机制 | A/%↑ | R/%↓ | EAO/%↑ | ΔEAO/% |
|---|---|---|---|---|
| SiamFC | 50.1 | 58.8 | 18.6 | — |
| Base | 51.3 | 47.7 | 21.7 | +3.1 |
| Base+CA | 53.6 | 37.8 | 24.8 | +6.2 |
| Base+SA | 55.9 | 33.1 | 26.0 | +7.4 |
| Base+CA+SA | 57.3 | 30.7 | 28.4 | +9.8 |
| 1 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]// Proceedings of the 2016 European Conference on Computer Vision. Cham: Springer, 2016: 850-865. 10.1007/978-3-319-48881-3_56 |
| 2 | VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 2805-2813. 10.1109/cvpr.2017.531 |
| 3 | ZHANG Z, PENG H. Deeper and wider Siamese networks for real-time visual tracking[C]// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4591-4600. 10.1109/cvpr.2019.00472 |
| 4 | LI B, YAN J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
| 5 | WANG Q, TENG Z, XING J L, et al. Learning attentions: residual attentional Siamese network for high performance online visual tracking[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4854-4863. 10.1109/cvpr.2018.00510 |
| 6 | PARK J, WOO S, LEE J Y, et al. Bam: Bottleneck attention module[EB/OL].[2020-10-10].. 10.1007/s11263-019-01283-0 |
| 7 | 费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305. |
| FEI D S, SONG H H, ZHANG K H. Multi-level feature enhancement for real-time visual tracking[J]. Journal of Computer Applications, 2020, 40(11): 3300-3305. | |
| 8 | 李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 |
| LI S W, ZHANG X D. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking[J]. Journal of Computer Applications, 2020, 40(8): 2219-2224. 10.11772/j.issn.1001-9081.2019122139 | |
| 9 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
| 10 | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. [2020-10-10]. . 10.4236/psych.2020.1110096 |
| 11 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [EB/OL].[2020-10-10].. 10.1016/s0262-4079(17)32358-8 |
| 12 | CHEN L, ZHANG H, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017: 5659-5667. 10.1109/cvpr.2017.667 |
| 13 | WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]// Proceedings of the 2020 IEEE CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11531-11539. 10.1109/cvpr42600.2020.01155 |
| 14 | HUANG L, ZHAO X, HUANG K. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,43(5): 1562-1577. |
| 15 | WU Y, LIM J, YANG M H. Online object tracking: a benchmark[C]// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2013: 2411-2418. 10.1109/cvpr.2013.312 |
| 16 | WU Y, LIM J, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis Machine Intelligence,2015,37(9):1834-1848. 10.1109/tpami.2014.2388226 |
| 17 | KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]// Proceedings of the 2018 the European Conference on Computer Vision. Cham: Springer, 2018: 3-53. |
| 18 | FAN H, LIN L, YANG F, et al. LaSOT: a high-quality benchmark for large-scale single object tracking[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5374-5383. 10.1109/cvpr.2019.00552 |
| 19 | LUKEZIC A, VOJIR T, ZAJC L C, et al. Discriminative correlation filter with channel and spatial reliability[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6309-6318. 10.1109/cvpr.2017.515 |
| 20 | NAM H, HAN B. Learning multi-domain convolutional neural networks for visual tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4293-4302. 10.1109/cvpr.2016.465 |
| 21 | GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE,2017: 1763-1771. 10.1109/iccv.2017.196 |
| 22 | DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6638-6646. 10.1109/cvpr.2017.733 |
| 23 | GALOOGAHI H K, FAGG A, LUCEY S. Learning background-aware correlation filters for visual tracking[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1135-1143. 10.1109/iccv.2017.129 |
| 24 | DANELLJAN M, HAGER G, KHAN F S, et al. Learning spatially regularized correlation filters for visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE,2015: 4310-4318. 10.1109/iccv.2015.490 |
| 25 | BERTINETTO L, VALMADRE J, GOLODETZ S, et al. Staple: Complementary learners for real-time tracking[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1401-1409. 10.1109/cvpr.2016.156 |
| 26 | HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583-596. 10.1109/tpami.2014.2345390 |
| 27 | DANELLJAN M, HAGER G, KHAN F S, et al. Convolutional features for correlation filter based visual tracking[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE,2015: 58-66. 10.1109/iccvw.2015.84 |
| [1] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
| [2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
| [3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
| [4] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
| [5] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
| [6] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
| [7] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
| [8] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
| [9] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
| [10] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
| [11] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
| [12] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
| [13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
| [14] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
| [15] | Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||