计算机应用 ›› 2019, Vol. 39 ›› Issue (8): 2242-2246.DOI: 10.11772/j.issn.1001-9081.2018122566

• 人工智能 • 上一篇    下一篇

基于双重金字塔网络的视频目标分割方法

姜斯浩, 宋慧慧, 张开华, 汤润发   

  1. 江苏省大数据分析技术重点实验室(南京信息工程大学), 南京 210044
  • 收稿日期:2019-01-02 修回日期:2019-03-14 出版日期:2019-08-10 发布日期:2019-03-28
  • 通讯作者: 宋慧慧
  • 作者简介:姜斯浩(1994-),男,江苏盐城人,硕士研究生,主要研究方向:视频目标分割;宋慧慧(1986-),女,山东聊城人,教授,博士,主要研究方向:遥感图像处理;张开华(1983-),男,山东日照人,教授,博士,CCF会员,主要研究方法:图像分割、目标跟踪;汤润发(1995-),男,江苏盐城人,硕士研究生,主要研究方向:视频目标分割。
  • 基金资助:
    国家自然科学基金资助项目(61872189,61876088);江苏省自然科学基金资助项目(BK20170040)。

Video object segmentation method based on dual pyramid network

JIANG Sihao, SONG Huihui, ZHANG Kaihua, TANG Runfa   

  1. Jiangsu Key Laboratory of Big Data Analysis Technology(Nanjing University of Information Science and Technology), Nanjing Jiangsu 210044, China
  • Received:2019-01-02 Revised:2019-03-14 Online:2019-08-10 Published:2019-03-28
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61872189, 61876088), the Natural Science Foundation of Jiangsu Province (BK20170040).

摘要: 针对复杂视频场景中难以分割特定目标的问题,提出一种基于双重金字塔网络(DPN)的视频目标分割方法。首先,通过调制网络的单向传递让分割模型适应特定目标的外观。具体而言,从给定目标的视觉和空间信息中学习一种调制器,并通过调制器调节分割网络的中间层以适应特定目标的外观变化。然后,通过基于不同区域的上下文聚合的方法,在分割网络的最后一层中聚合全局上下文信息。最后,通过横向连接的自左而右结构,在所有尺度中构建高阶语义特征图。所提出的视频目标分割方法是一个可以端到端训练的分割网络。大量实验结果表明,所提方法在DAVIS2016数据集上的性能与较先进的使用在线微调的方法相比,可达到相竞争的结果,且在DAVIS2017数据集上性能较优。

关键词: 视频目标分割, 特征金字塔, 卷积神经网络, 深度学习, 多尺度融合

Abstract: Focusing on the issue that it is difficult to segment a specific object in a complex video scene, a video object segmentation method based on Dual Pyramid Network (DPN) was proposed. Firstly, the one-way transmission of modulating network was used to make the segmentation model adapt to the appearance of a specific object, which means, a modulator was learned based on visual and spatial information of target object to modulate the intermediate layers of segmentation network to make the network adapt to the appearance changes of specific object. Secondly, global context information was aggregated in the last layer of segmentation network by different-region-based context aggregation method. Finally, a left-to-right architecture with lateral connections was developed for building high-level semantic feature maps at all scales. The proposed video object segmentation method is a network which is able to be trained end-to-end. Extensive experimental results show that the proposed method achieves results which can be competitive to the results of the state-of-the-art methods using online fine-tuning on DAVIS2016 dataset, and outperforms other methods on DAVIS2017 dataset.

Key words: video object segmentation, feature pyramid, Convolutional Neural Network (CNN), deep learning, multi-scale fusion

中图分类号: