《计算机应用》唯一官方网站

• •    下一篇

深浅层表示融合的半监督视频目标分割

吕潇1,宋慧慧1,樊佳庆2   

  1. 1. 南京信息工程大学
    2. 江苏省南京市浦口区宁六路219号 南京信息工程大学 信息与控制学院
  • 收稿日期:2021-09-17 修回日期:2022-01-11 发布日期:2022-04-15 出版日期:2022-04-15
  • 通讯作者: 宋慧慧
  • 基金资助:
    国家自然科学基金项目;江苏省自然科学基金项目

Semi-supervised Video Object Segmentation via Attentive Deep and Shallow Representations Fusion

  • Received:2021-09-17 Revised:2022-01-11 Online:2022-04-15 Published:2022-04-15
  • Contact: Hui HuiSONG
  • Supported by:
    National Natural Science Foundation of China;the Natural Science Foundation of Jiangsu Province

摘要: 摘 要: 为了解决半监督视频目标分割任务中,分割精度与分割速度难以兼顾以及无法对视频中与前景相似的背景目标做出有效区分的问题,提出一种基于深浅层特征融合的半监督视频目标分割算法。首先利用预先生成的粗糙掩膜对图像特征进行处理,能够获取更鲁棒的特征,然后通过注意力模型提取深层语义信息,最后将深层语义信息与浅层位置信息进行融合,得到更加精确的分割结果。所提算法能够在保持较快分割速度的情况下实现更高的分割精度并且能够有效区别相似的前景与背景目标,具有较强的鲁棒性。在多个流行的数据集上进行了大量测评,结果表明:在分割运行速度基本不变的情况下,所提算法在DAVIS 2016数据集上的雅卡尔(J)指标相比学习快速鲁棒目标模型的视频目标分割算法(FRTM)提高了1.8个百分点,综合评价指标J&F相比FRTM提高了2.3个百分点,同时,在DAVIS 2017数据集上,所提算法的J指标比FRTM高了1.2个百分点,综合评价指标J&F比FRTM高了1.1个百分点。充分证明所提算法在平衡速度与精度以及有效区分前景背景方面的优越性能。

关键词: 关键词: 视频目标分割, 注意力, 融合, 深层语义信息, 浅层位置信息

Abstract: Abstract: In order to solve the problems that the segmentation accuracy and speed are difficult to balance and the algorithm cannot effectively distinguish similar foreground and background in the task of video object segmentation, a semi-supervised video object segmentation algorithm was proposed. Firstly, a pre-generated rough mask was used to process image features, which can achieve more robust features. Secondly, deep semantic information was extracted by attention model. Finally, deep semantic information and shallow position information were fused to obtain accurate segmentation results. The proposed algorithm can achieve higher segmentation accuracy with fast speed and effectively distinguish background and foreground objects. Experiments were conducted on multiple popular datasets. Experimental results demonstrate that the proposed algorithm improves the Jaccard(J) index by 1.8 percentage points and improves the comprehensive evaluation index J&F by 2.3 percentage points compared with Learning Fast and Robust Target Models for Video Object Segmentation (FRTM) on evaluation dataset DAVIS 2016. Meanwhile, on evaluation dataset DAVIS 2017, the proposed algorithm improves the J index by 1.2 percentage points and improves the comprehensive evaluation index J&F by 1.1 percentage points compared with FRTM. The above results fully prove that the proposed algorithm has superior performance in balancing speed and accuracy and effectively distinguishing foreground and background.

Key words: Keywords: video object segmentation, attention, fuse, deep semantic information, shallow position information

中图分类号: