Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3884-3890.DOI: 10.11772/j.issn.1001-9081.2021091636

• Multimedia computing and computer simulation • Previous Articles    

Semi-supervised video object segmentation via deep and shallow representations fusion

Xiao LYU1, Huihui SONG2(), Jiaqing FAN1   

  1. 1.Jiangsu Key Laboratory of Big Data Analysis Technology (Nanjing University of Information Science and Technology),Nanjing Jiangsu 210044,China
    2.Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (Nanjing University of Information Science and Technology),Nanjing Jiangsu 210044,China
  • Received:2021-09-17 Revised:2022-01-11 Accepted:2022-01-19 Online:2022-12-21 Published:2022-12-10
  • Contact: Huihui SONG
  • About author:LYU Xiao, born in 1996, M. S. candidate. His research interests include video object segmentation, video object tracking.
    FAN Jiaqing, born in 1994, Ph. D. candidate. His research interests include video object tracking.
  • Supported by:
    National Natural Science Foundation of China(61872189);Natural Science Foundation of Jiangsu Province(BK20191397)

深浅层表示融合的半监督视频目标分割

吕潇1, 宋慧慧2(), 樊佳庆1   

  1. 1.江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044
    2.江苏省大气环境与装备技术协同创新中心(南京信息工程大学), 南京 210044
  • 通讯作者: 宋慧慧
  • 作者简介:吕潇(1996—),男,江苏泰州人,硕士研究生,主要研究方向:视频目标分割、视频目标跟踪
    樊佳庆(1994—),男,江苏南通人,博士研究生,主要研究方向:视频目标跟踪。
  • 基金资助:
    国家自然科学基金资助项目(61872189);江苏省自然科学基金资助项目(BK20191397)

Abstract:

In order to solve the problems that the segmentation accuracy and speed are difficult to balance and the algorithm cannot effectively distinguish similar foreground and background objects in the task of semi-supervised video object segmentation, a semi-supervised video object segmentation algorithm was proposed on the basis of deep and shallow feature fusion. Firstly, a pre-generated rough mask was used to process image features, thereby achieving more robust features. Secondly, deep semantic information was extracted by the attention model. Finally, deep semantic information and shallow position information were fused to obtain more accurate segmentation results. Experiments were conducted on multiple popular datasets. The experiment results demonstrate that the proposed algorithm improves the Jaccard (J) index by 1.8 percentage points and improves the comprehensive evaluation index mean of J and F?score J&F by 2.3 percentage points compared with Learning Fast and Robust Target Models for Video Object Segmentation (FRTM) algorithm on DAVIS 2016 dataset. Meanwhile, on DAVIS 2017 dataset, the proposed algorithm improves J index by 1.2 percentage points and improves the comprehensive evaluation index J&F by 1.1 percentage points compared with FRTM algorithm. The above results fully prove that the proposed algorithm can achieve higher segmentation accuracy with fast speed, and effectively distinguish background and foreground objects with strong robustness. It can be seen that the proposed algorithm has superior performance in balancing speed and accuracy and effectively distinguishing foreground and background.

Key words: video object segmentation, attention, fusion, deep semantic information, shallow position information

摘要:

为了解决半监督视频目标分割任务中,分割精度与分割速度难以兼顾以及无法对视频中与前景相似的背景目标做出有效区分的问题,提出一种基于深浅层特征融合的半监督视频目标分割算法。首先,利用预先生成的粗糙掩膜对图像特征进行处理,以获取更鲁棒的特征;然后,通过注意力模型提取深层语义信息;最后,将深层语义信息与浅层位置信息进行融合,从而得到更加精确的分割结果。在多个流行的数据集上进行了实验,实验结果表明:在分割运行速度基本不变的情况下,所提算法在DAVIS 2016数据集上的雅卡尔(J)指标相较于学习快速鲁棒目标模型的视频目标分割(FRTM)算法提高了1.8个百分点,综合评价指标为JF得分的均值J&F相较于FRTM提高了2.3个百分点;同时,在DAVIS 2017数据集上,所提算法的J指标比FRTM提升了1.2个百分点,综合评价指标J&F比FRTM提升了1.1个百分点。以上结果充分说明所提算法能够在保持较快分割速度的情况下实现更高的分割精度,并且能够有效区别相似的前景与背景目标,具有较强的鲁棒性。可见所提算法在平衡速度与精度以及有效区分前景背景方面的优越性能。

关键词: 视频目标分割, 注意力, 融合, 深层语义信息, 浅层位置信息

CLC Number: