计算机应用 ›› 2021, Vol. 41 ›› Issue (7): 2054-2061.DOI: 10.11772/j.issn.1001-9081.2020091523

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于分组卷积进行特征融合的全景分割算法

冯兴杰1,2, 张天泽1   

  1. 1. 中国民航大学 计算机科学与技术学院, 天津 300300;
    2. 中国民航大学 信息网络中心, 天津 300300
  • 收稿日期:2020-09-30 修回日期:2020-11-25 出版日期:2021-07-10 发布日期:2020-12-14
  • 通讯作者: 张天泽
  • 作者简介:冯兴杰(1969-),男,河北邢台人,教授,博士,主要研究方向:数据库、数据仓库、智能信息处理;张天泽(1994-),男,天津人,硕士研究生,主要研究方向:计算机视觉、图像分割。
  • 基金资助:
    中国民用航空局安全能力建设项目(AADSA201909);天津市教委科研计划项目(2019SK110);中央高校基本科研业务费专项资金资助项目(3122019009)。

Panoptic segmentation algorithm based on grouped convolution for feature fusion

FENG Xingjie1,2, ZHANG Tianze1   

  1. 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China;
    2. Information Network Center, Civil Aviation University of China, Tianjin 300300, China
  • Received:2020-09-30 Revised:2020-11-25 Online:2021-07-10 Published:2020-12-14
  • Supported by:
    This work is partially supported by the CAAC Security Capacity Building Project (AADSA 201909), the Scientific Research Program of Tianjin Municipal Education Commission (2019sk110), the Fundamental Research Funds for the Central Universities (312219009).

摘要: 针对图像全景分割任务对于实践应用中现有网络结构运算不够快速的问题,提出一种基于分组卷积进行特征融合的全景分割算法。首先,通过自底向上的方式选择经典残差网络结构(ResNet)进行特征提取,并采用不同扩张率的空洞卷积空间金字塔池化操作(ASPP)对提取到的特征进行语义分割与实例分割的多尺度特征融合;然后,通过提出一种单路分组卷积上采样方法,整合语义与实例特征进行上采样特征融合至指定大小;最后,通过对语义分支、实例分支以及实例中心点这三个分支进行损失函数运算以得到更加精细的全景分割输出结果。该模型在CityScapes数据集上与注意力引导的联合全景分割网络(AUNet)、全景特征金字塔网络(Panoptic FPN)、亲和金字塔单阶段实例分割算法(SSAP)、联合全景分割网络(UPSNet)、Panoptic-DeepLab等方法进行了实验对比。实验结果表明,与对比方法中表现最好的Panoptic-DeepLab模型相比,所提模型在极大减少了解码网络参数量的情况下,全景分割质量(PQ)值为0.565,仅下降了0.003,在建筑物、火车、自行车等物体的分割质量上有0.3~5.5的提升,平均精确率(AP)、目标IoU阈值超过50%的平均精确率(AP50)分别提升了0.002与0.014,平均交并比(mIoU)值提升了0.06。可见该方法能提升图像全景分割速度,在PG、AP、mIoU三个指标上均有较好的精度,可以有效地完成全景分割任务。

关键词: 图像全景分割, 语义分割, 实例分割, 分组卷积, 空洞卷积, 空间金字塔池化

Abstract: Aiming at the problem that the computing of the image panoptic segmentation task is not fast enough for the existing network structures in practical applications, a panoptic segmentation algorithm based on grouped convolution for feature fusion was proposed. Firstly, through the bottom-up method, the classic Residual Network structure (ResNet) was selected for feature extraction, and the multi-scale feature fusion of semantic segmentation and instance segmentation was performed on the extracted features by using the Atrous convolutional Spatial Pyramid Pooling operation (ASPP) with different expansion rates. Secondly, a single-channel grouped convolution upsampling method was proposed to integrate the semantics and instance features for performing upsampling feature fusion to a specified size. Finally, a more refined panoptic segmentation output result was obtained by performing loss function on semantic branch, instance branch and instance center point respectively. The model was compared with Attention-guided Unified Network for panoptic segmentation (AUNet), Panoptic Feature Pyramid Network (Panoptic FPN), Single-shot instance Segmentation with Affinity Pyramid (SSAP), Unified Panoptic Segmentation Network (UPSNet), Panoptic-DeepLab and other methods on CityScapes dataset. Compared with the Panoptic-DeepLab model, which is the best-performing model in the comparison models, with the decoding network parameters reduced significantly, the proposed model has the Panoptic Quality (PQ) of 0.565, with a slight decrease of 0.003, and the segmentation qualities of objects such as buildings, trains, bicycles were improved by 0.3-5.5, the Average Precision (AP) and the Average Precision with target IoU (Intersection over Union) threshold over 50% (AP50) were improved by 0.002 and 0.014 respectively, and the mean IoU (mIoU) value was increased by 0.06. It can be seen that the proposed method improves the speed of image panoptic segmentation, has good accuracy in the three indexes of PQ, AP and mIoU, and can effectively complete the panoptic segmentation tasks.

Key words: image panoptic segmentation, semantic segmentation, instance segmentation, grouped convolution, atrous convolution, spatial pyramid pooling

中图分类号: