《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (12): 3884-3890.DOI: 10.11772/j.issn.1001-9081.2021091636
收稿日期:
2021-09-17
修回日期:
2022-01-11
接受日期:
2022-01-19
发布日期:
2022-12-21
出版日期:
2022-12-10
通讯作者:
宋慧慧
作者简介:
吕潇(1996—),男,江苏泰州人,硕士研究生,主要研究方向:视频目标分割、视频目标跟踪基金资助:
Xiao LYU1, Huihui SONG2(), Jiaqing FAN1
Received:
2021-09-17
Revised:
2022-01-11
Accepted:
2022-01-19
Online:
2022-12-21
Published:
2022-12-10
Contact:
Huihui SONG
About author:
LYU Xiao, born in 1996, M. S. candidate. His research interests include video object segmentation, video object tracking.Supported by:
摘要:
为了解决半监督视频目标分割任务中,分割精度与分割速度难以兼顾以及无法对视频中与前景相似的背景目标做出有效区分的问题,提出一种基于深浅层特征融合的半监督视频目标分割算法。首先,利用预先生成的粗糙掩膜对图像特征进行处理,以获取更鲁棒的特征;然后,通过注意力模型提取深层语义信息;最后,将深层语义信息与浅层位置信息进行融合,从而得到更加精确的分割结果。在多个流行的数据集上进行了实验,实验结果表明:在分割运行速度基本不变的情况下,所提算法在DAVIS 2016数据集上的雅卡尔(J)指标相较于学习快速鲁棒目标模型的视频目标分割(FRTM)算法提高了1.8个百分点,综合评价指标为J和F得分的均值J&F相较于FRTM提高了2.3个百分点;同时,在DAVIS 2017数据集上,所提算法的J指标比FRTM提升了1.2个百分点,综合评价指标J&F比FRTM提升了1.1个百分点。以上结果充分说明所提算法能够在保持较快分割速度的情况下实现更高的分割精度,并且能够有效区别相似的前景与背景目标,具有较强的鲁棒性。可见所提算法在平衡速度与精度以及有效区分前景背景方面的优越性能。
中图分类号:
吕潇, 宋慧慧, 樊佳庆. 深浅层表示融合的半监督视频目标分割[J]. 计算机应用, 2022, 42(12): 3884-3890.
Xiao LYU, Huihui SONG, Jiaqing FAN. Semi-supervised video object segmentation via deep and shallow representations fusion[J]. Journal of Computer Applications, 2022, 42(12): 3884-3890.
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 81.4 | 82.1 | 81.8 | 14.30 | ― |
文献[ | 88.7 | 89.9 | 89.3 | 6.25 | ― |
文献[ | 81.1 | 82.2 | 81.7 | 2.20 | ― |
文献[ | 74.0 | 72.9 | 73.5 | 7.14 | ― |
文献[ | 84.9 | 88.6 | 86.8 | 0.01 | ― |
文献[ | 85.6 | 87.5 | 86.6 | 0.22 | ― |
文献[ | 86.1 | 84.9 | 85.5 | 0.08 | ― |
文献[ | 82.6 | 83.6 | 83.1 | 39.00 | ― |
FRTM | 83.7 | 83.4 | 83.6 | 21.90 | 18.17 |
本文算法 | 85.5 | 86.3 | 85.9 | ― | 17.76 |
表1 不同算法在DAVIS 2016验证集上的评估结果
Tab. 1 Evaluation results of different algorithms on DAVIS 2016 validation set
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 81.4 | 82.1 | 81.8 | 14.30 | ― |
文献[ | 88.7 | 89.9 | 89.3 | 6.25 | ― |
文献[ | 81.1 | 82.2 | 81.7 | 2.20 | ― |
文献[ | 74.0 | 72.9 | 73.5 | 7.14 | ― |
文献[ | 84.9 | 88.6 | 86.8 | 0.01 | ― |
文献[ | 85.6 | 87.5 | 86.6 | 0.22 | ― |
文献[ | 86.1 | 84.9 | 85.5 | 0.08 | ― |
文献[ | 82.6 | 83.6 | 83.1 | 39.00 | ― |
FRTM | 83.7 | 83.4 | 83.6 | 21.90 | 18.17 |
本文算法 | 85.5 | 86.3 | 85.9 | ― | 17.76 |
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 67.2 | 72.7 | 70.0 | 14.30 | ― |
文献[ | 79.2 | 84.3 | 81.8 | 6.25 | ― |
文献[ | 69.1 | 74.0 | 71.5 | 2.20 | ― |
文献[ | 52.5 | 57.1 | 54.8 | 7.14 | ― |
文献[ | 73.9 | 81.7 | 77.8 | 0.01 | ― |
文献[ | 64.7 | 71.3 | 68.0 | 0.22 | ― |
文献[ | 64.5 | 71.2 | 67.9 | 0.08 | ― |
文献[ | 68.6 | 76.0 | 72.3 | 39.00 | ― |
FRTMT | 73.8 | 79.6 | 76.7 | 21.90 | 18.17 |
本文算法 | 75.0 | 80.5 | 77.8 | ― | 17.76 |
表2 不同算法在DAVIS 2017验证集上的评估结果
Tab. 2 Evaluation results of different algorithms on DAVIS 2017 validation set
算法 | J/% | F/% | J&F/% | 帧率/ (frame·s-1) | 帧率(2080Ti)/ (frame·s-1) |
---|---|---|---|---|---|
文献[ | 67.2 | 72.7 | 70.0 | 14.30 | ― |
文献[ | 79.2 | 84.3 | 81.8 | 6.25 | ― |
文献[ | 69.1 | 74.0 | 71.5 | 2.20 | ― |
文献[ | 52.5 | 57.1 | 54.8 | 7.14 | ― |
文献[ | 73.9 | 81.7 | 77.8 | 0.01 | ― |
文献[ | 64.7 | 71.3 | 68.0 | 0.22 | ― |
文献[ | 64.5 | 71.2 | 67.9 | 0.08 | ― |
文献[ | 68.6 | 76.0 | 72.3 | 39.00 | ― |
FRTMT | 73.8 | 79.6 | 76.7 | 21.90 | 18.17 |
本文算法 | 75.0 | 80.5 | 77.8 | ― | 17.76 |
算法 | J | F | 综合指标g | ||
---|---|---|---|---|---|
可见 | 未见 | 可见 | 未见 | ||
文献[ | 67.8 | 60.8 | 69.5 | 66.2 | 66.1 |
文献[ | 60.1 | 46.1 | 62.7 | 51.4 | 55.2 |
文献[ | 71.4 | 56.5 | ― | ― | 66.9 |
文献[ | ― | ― | ― | ― | 68.2 |
本文算法 | 68.0 | 60.7 | 71.3 | 68.4 | 67.1 |
表3 不同算法在YouTube-VOS验证集上的评估结果 ( %)
Tab. 3 Evaluation results of different algorithms on YouTube-VOS validation set
算法 | J | F | 综合指标g | ||
---|---|---|---|---|---|
可见 | 未见 | 可见 | 未见 | ||
文献[ | 67.8 | 60.8 | 69.5 | 66.2 | 66.1 |
文献[ | 60.1 | 46.1 | 62.7 | 51.4 | 55.2 |
文献[ | 71.4 | 56.5 | ― | ― | 66.9 |
文献[ | ― | ― | ― | ― | 68.2 |
本文算法 | 68.0 | 60.7 | 71.3 | 68.4 | 67.1 |
模型 | J&F | 模型 | J&F |
---|---|---|---|
Base | 81.4 | Base+Fuse | 85.2 |
Base+EHOA | 84.6 | Base+EHOA+Fuse | 85.9 |
表4 消融实验结果 ( %)
Tab. 4 Ablation experimental results
模型 | J&F | 模型 | J&F |
---|---|---|---|
Base | 81.4 | Base+Fuse | 85.2 |
Base+EHOA | 84.6 | Base+EHOA+Fuse | 85.9 |
模型 | λ1 | λ2 | λ3 | J&F/% |
---|---|---|---|---|
HOA | ― | ― | ― | 85.0 |
EHOA | 1.0 | 0.0 | 0.0 | 84.1 |
0.0 | 1.0 | 0.0 | 84.6 | |
0.0 | 0.0 | 1.0 | 84.8 | |
0.1 | 0.2 | 0.7 | 85.1 | |
0.2 | 0.3 | 0.5 | 85.9 | |
0.3 | 0.3 | 0.4 | 85.3 | |
0.6 | 0.3 | 0.1 | 85.0 |
表5 EHOA模型与HOA模型的实验结果对比
Tab. 5 Comparison of experimental results of EHOA and HOA models
模型 | λ1 | λ2 | λ3 | J&F/% |
---|---|---|---|---|
HOA | ― | ― | ― | 85.0 |
EHOA | 1.0 | 0.0 | 0.0 | 84.1 |
0.0 | 1.0 | 0.0 | 84.6 | |
0.0 | 0.0 | 1.0 | 84.8 | |
0.1 | 0.2 | 0.7 | 85.1 | |
0.2 | 0.3 | 0.5 | 85.9 | |
0.3 | 0.3 | 0.4 | 85.3 | |
0.6 | 0.3 | 0.1 | 85.0 |
层特征 | J&F/% | 层特征 | J&F/% |
---|---|---|---|
Layer2 | 85.9 | Layer4 | 85.0 |
Layer3 | 85.4 | Layer5 | 84.8 |
表6 不同层特征的实验结果对比 ( %)
Tab. 6 Comparison of experimental results with features of different layers
层特征 | J&F/% | 层特征 | J&F/% |
---|---|---|---|
Layer2 | 85.9 | Layer4 | 85.0 |
Layer3 | 85.4 | Layer5 | 84.8 |
1 | GUO J M, LI Z W, CHEONG L F, et al. Video co-segmentation for meaningful action extraction[C]// Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2013: 2232-2239. 10.1109/iccv.2013.278 |
2 | 杨天明,陈志,岳文静. 基于视频深度学习的时空双流人物动作识别模型[J]. 计算机应用, 2018, 38(3): 895-899, 915. 10.11772/j.issn.1001-9081.2017071740 |
YANG T M, CHEN Z, YUE W J. Spatio-temporal two-stream human action recognition model based on video deep learning[J]. Journal of Computer Applications, 2018, 38(3): 895-899, 915. 10.11772/j.issn.1001-9081.2017071740 | |
3 | 胡学敏,童秀迟,郭琳,等. 基于深度视觉注意神经网络的端到端自动驾驶模型[J]. 计算机应用, 2020, 40(7): 1926-1931. 10.11772/j.issn.1001-9081.2019112054 |
HU X M, TONG X C, GUO L, et al. End-to-end autonomous driving model based on deep visual attention neural network[J]. Journal of Computer Applications, 2020, 40(7): 1926-1931. 10.11772/j.issn.1001-9081.2019112054 | |
4 | SALEH K, HOSSNY M, NAHAVANDI S. Kangaroo vehicle collision detection using deep semantic segmentation convolutional neural network[C]// Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications. Piscataway: IEEE, 2016: 1-7. 10.1109/dicta.2016.7797057 |
5 | OH S W, LEE J Y, SUNKAVALLI K, et al. Fast video object segmentation by reference-guided mask propagation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7376-7385. 10.1109/cvpr.2018.00770 |
6 | CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5320-5329. 10.1109/cvpr.2017.565 |
7 | MANINIS K K, CAELLES S, CHEN Y H, et al. Video object segmentation without temporal information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(6): 1515-1530. 10.1109/tpami.2018.2838670 |
8 | PERAZZI F, KHOREVA A, BENENSON R, et al. Learning video object segmentation from static images[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3491-3500. 10.1109/cvpr.2017.372 |
9 | XU N, YANG L J, FAN Y C, et al. YouTube-VOS: sequence-to-sequence video object segmentation[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11209. Cham: Springer, 2018: 603-619. |
10 | VOIGTLAENDER P, LEIBE B. Online adaptation of convolutional neural networks for video object segmentation[C]// Proceedings of the 2017 British Machine Vision Conference. Durham: BMVA Press, 2017: No.116. 10.5244/c.31.116 |
11 | OH S W, LEE J Y, XU N, et al. Video object segmentation using space-time memory networks[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9225-9234. 10.1109/iccv.2019.00932 |
12 | VOIGTLAENDER P, CHAI Y N, SCHROFF F, et al. FEELVOS: fast end-to-end embedding learning for video object segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9473-9482. 10.1109/cvpr.2019.00971 |
13 | HU Y T, HUANG J B, SCHWING A G. VideoMatch: matching based video object segmentation[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11212. Cham: Springer, 2018: 56-73. |
14 | 王宁,宋慧慧,张开华. 基于距离加权重叠度估计与椭圆拟合优化的精确目标跟踪算法[J]. 计算机应用, 2021, 41(4): 1100-1105. 10.11772/j.issn.1001-9081.2020060869 |
WANG N, SONG H H, ZHANG K H. Accurate object tracking algorithm based on distance weighting overlap prediction and ellipse fitting optimization[J]. Journal of Computer Applications, 2021, 41(4): 1100-1150. 10.11772/j.issn.1001-9081.2020060869 | |
15 | WANG Q, ZHANG L, BERTINETTO L, et al. Fast online object tracking and segmentation: a unifying approach[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1328-1338. 10.1109/cvpr.2019.00142 |
16 | LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8971-8980. 10.1109/cvpr.2018.00935 |
17 | PERAZZI F, PONT-TUSET J, McWILLIAMS B, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 724-732. 10.1109/cvpr.2016.85 |
18 | PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 DAVIS challenge on video object segmentation[EB/OL]. (2018-03-01) [2021-04-03].. 10.1109/cvpr.2017.565 |
19 | CHEN X, LI Z X, YUAN Y, et al. State-aware tracker for real-time video object segmentation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 9381-9390. 10.1109/cvpr42600.2020.00940 |
20 | ROBINSON A, JÄREMO LAWIN F, DANELLJAN M, et al. Learning fast and robust target models for video object segmentation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7404-7413. 10.1109/cvpr42600.2020.00743 |
21 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
22 | DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: accurate tracking by overlap maximization[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4655-4664. 10.1109/cvpr.2019.00479 |
23 | CHEN B H, DENG W H, HU J N. Mixed high-order attention network for person re-identification[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 371-381. 10.1109/iccv.2019.00046 |
24 | LI W, ZHU X T, GONG S G. Person re-identification by deep joint learning of multi-loss classification[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 2194-2200. 10.24963/ijcai.2017/305 |
25 | LIU J X, NI B B, YAN Y C, et al. Pose transferrable person re-identification[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4099-4108. 10.1109/cvpr.2018.00431 |
26 | ZHONG Z, ZHENG L, CAO D L, et al. Re-ranking person re-identification with k-reciprocal encoding[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 3652-3661. 10.1109/cvpr.2017.389 |
27 | XIANG X Y, TIAN Y P, ZHANG Y L, et al. Zooming Slow-Mo: fast and accurate one-stage space-time video super-resolution[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3367-3376. 10.1109/cvpr42600.2020.00343 |
28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
29 | XU N, YANG L J, FAN Y C, et al. YouTube-VOS: a large-scale video object segmentation benchmark[EB/OL]. (2018-09-06) [2021-08-22].. 10.1007/978-3-030-01228-1_36 |
30 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2021-08-22].. |
31 | JOHNANDER J, DANELLJAN M, BRISSMAN E, et al. A generative appearance model for end-to-end video object segmentation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 8945-8954. 10.1109/cvpr.2019.00916 |
32 | YANG L J, WANG Y R, XIONG X H, et al. Efficient video object segmentation via network modulation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6499-6507. 10.1109/cvpr.2018.00680 |
33 | LUITEN J, VOIGTLAENDER P, LEIBE B. PReMVOS: proposal-generation, refinement and merging for video object segmentation[C]// Proceedings of the 2018 Asian Conference on Computer Vision, LNCS 11364. Cham: Springer, 2019: 565-580. |
[1] | 王娜, 蒋林, 李远成, 朱筠. 基于图形重写和融合探索的张量虚拟机算符融合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2802-2809. |
[2] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[3] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[4] | 黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885. |
[5] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[6] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[7] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[8] | 杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725. |
[9] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[10] | 任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746. |
[11] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[12] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[13] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[14] | 陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413. |
[15] | 汪才钦, 周渝皓, 张顺香, 王琰慧, 王小龙. 基于语境增强的新能源汽车投诉文本方面-观点对抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2430-2436. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||