Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 2937-2944.DOI: 10.11772/j.issn.1001-9081.2020121939

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Lightweight real-time semantic segmentation algorithm based on separable pyramid

GAO Shiwei, ZHANG Changzhu, WANG Zhuping   

  1. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
  • Received:2020-12-11 Revised:2021-04-12 Online:2021-10-10 Published:2021-07-16
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Shanghai (19ZR1461400).


高世伟, 张长柱, 王祝萍   

  1. 同济大学 电子与信息工程学院, 上海201804
  • 通讯作者: 张长柱
  • 作者简介:高世伟(1997-),男,安徽滁州人,硕士研究生,主要研究方向:深度学习、图像语义分割;张长柱(1984-),男,山东曲阜人,副教授,博士,主要研究方向:智能控制、深度学习;王祝萍(1973-),女,上海人,教授,博士,主要研究方向:智能控制、深度学习。
  • 基金资助:

Abstract: The existing semantic segmentation algorithms have too many parameters and huge memory usage, so that it is difficult to meet the requirements real-world applications such as automatic driving. In order to solve the problem, a novel, effective and lightweight real-time semantic segmentation algorithm based on Separable Pyramid Module (SPM) was proposed. Firstly, factorized convolution and dilated convolution were adopted in the form of a feature pyramid to construct the bottleneck structure, providing a simple but effective way to extract local and contextual information. Then, the Context Channel Attention (CCA) module based on computer vision attention was proposed to modify the channel weights of shallow feature maps by utilizing deep semantic features, thereby optimizing the segmentation results. Experimental results show that without pre-training or any additional processing, the proposed algorithm achieves mean Intersection-over-Union (mIoU) of 71.86% on Cityscapes test set at the speed of 91 Frames Per Second (FPS). Compared to Efficient Residual Factorized ConvNet (ERFNet), the proposed algorithm has the mIoU 3.86 percentage points higher, and the processing speed of 2.2 times. Compared with the latest Light-weighted Network with Efficient Reduced Non-local operation for real-time semantic segmentation (LRNNet), the proposed algorithm has the mIoU slightly lower by 0.34 percentage points, but the processing speed increased by 20 FPS. The experimental results show that the proposed algorithm has great value for completing tasks such as efficient and accurate street scene image segmentation required in automatic driving.

Key words: real-time semantic segmentation, Deep Convolutional Net (DeepLab), factorized convolution, dilated convolution, channel attention mechanism

摘要: 针对现有语义分割算法参数量过多、内存占用巨大导致其很难满足自动驾驶需要等现实应用的问题,提出一种基于可分离金字塔模块(SPM)的新颖、有效且轻量的实时语义分割算法。首先,利用特征金字塔形式的分解卷积和扩张卷积来构建瓶颈结构,从而以一种简单但有效的方式提取局部和上下文信息;然后,提出基于计算机视觉注意力的上下文通道注意力(CCA)模块,来利用深层语义修改浅层特征图通道权重优化分割效果。实验结果显示:所提出的算法在Cityscapes测试集上以每秒91帧的速度达到了71.86%的平均交并比(mIoU)。相较高效残差分解卷积网络(ERFNet),所提算法mIoU提高了3.86个百分点,处理速度是其2.2倍;与最新的非局部高效实时算法(LRNNet)相比,所提算法mIoU略低0.34个百分点,但处理速度每秒上升了20帧。实验结果表明,所提算法有助于完成如自动驾驶中要求的高效、准确的街道场景图像分割任务。

关键词: 实时语义分割, 深度卷积网络, 分解卷积, 扩张卷积, 通道注意力机制

CLC Number: