《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3077-3085.DOI: 10.11772/j.issn.1001-9081.2022091438

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

高低维特征引导的实时语义分割网络

虞资兴1, 瞿绍军1(), 何鑫2, 王卓1   

  1. 1.湖南师范大学 信息科学与工程学院,长沙 410081
    2.湖南华诺星空电子技术有限公司,长沙 410221
  • 收稿日期:2022-09-29 修回日期:2022-12-06 接受日期:2022-12-12 发布日期:2023-03-23 出版日期:2023-10-10
  • 通讯作者: 瞿绍军
  • 作者简介:虞资兴(1997—),男,湖南株洲人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习
    瞿绍军(1979—),男,湖南永顺人,正高级实验师,博士,CCF会员,主要研究方向:图像分割、计算机视觉、深度学习. qshj@hunnu. edu. cn
    何鑫(1987—),男,湖南邵阳人,博士,主要研究方向:深度学习、雷达视觉融合
    王卓(2000—),女,湖南邵阳人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习。
  • 基金资助:
    国家自然科学基金资助项目(12071126)

High-low dimensional feature guided real-time semantic segmentation network

Zixing YU1, Shaojun QU1(), Xin HE2, Zhuo WANG1   

  1. 1.College of Information Science and Engineering,Hunan Normal University,Changsha Hunan 410081,China
    2.Hunan Novasky Electronic Technology Company Limited,Changsha Hunan 410221,China
  • Received:2022-09-29 Revised:2022-12-06 Accepted:2022-12-12 Online:2023-03-23 Published:2023-10-10
  • Contact: Shaojun QU
  • About author:YU Zixing, born in 1997, M. S. candidate. His research interests include computer vision, deep learning.
    QU Shaojun, born in 1979, Ph. D., senior experimentalist. His research interests include image segmentation, computer vision, deep learning.
    HE Xin, born in 1987, Ph. D. His research interests include deep learning, radar-vision fusion.
    WANG Zhuo, born in 2000, M. S. candidate. Her researchinterests include computer vision, deep learning.
  • Supported by:
    National Natural Science Foundation of China(12071126)

摘要:

多数语义分割网络利用双线性插值将高级特征图的分辨率恢复至与低级特征图一样的分辨率再进行融合操作,导致部分高级语义信息在空间上无法与低级特征图对齐,进而造成语义信息的丢失。针对以上问题,改进双边分割网络(BiSeNet),并基于此提出一种高低维特征引导的实时语义分割网络(HLFGNet)。首先,提出高低维特征引导模块(HLFGM)来通过低级特征图的空间位置信息引导高级语义信息在上采样过程中的位移;同时,利用高级特征图来获取强特征表达,并结合注意力机制来消除低级特征图中冗余的边缘细节信息以及减少像素误分类的情况。其次,引入改进后的金字塔池化引导模块(PPGM)来获取全局上下文信息并加强不同尺度局部上下文信息的有效融合。在Cityscapes验证集和CamVid测试集上的实验结果表明,HLFGNet的平均交并比(mIoU)分别为76.67%与70.90%,每秒传输帧数分别为75.0、96.2;而相较于BiSeNet,HLFGNet的mIoU分别提高了1.76和3.40个百分点。可见,HLFGNet能够较为准确地识别场景信息,并能满足实时性要求。

关键词: 实时语义分割, 上采样, 注意力机制, 金字塔池化, 上下文信息

Abstract:

Most semantic segmentation networks use bilinear interpolation to restore the resolution of the high-level feature map to the same resolution as the low-level feature map and then perform fusion operation, which causes that part of high-level semantic information cannot be spatially aligned with the low-level feature map, resulting in the loss of semantic information. To solve the problem, based on the improvement of Bilateral Segmentation Network (BiSeNet), a High-Low dimensional Feature Guided real-time semantic segmentation Network (HLFGNet) was proposed. First, High-Low dimensional Feature Guided Module (HLFGM) was proposed to guide the displacement of high-level semantic information during the upsampling process through the spatial position information of the low-level feature map. At the same time, the strong feature representations were obtained by the high-level feature maps, and by combining with the attention mechanism, the redundant edge detail information in the low-level feature map was eliminated and the pixel misclassification was reduced. Then, the improved Pyramid Pooling Guided Module (PPGM) was introduced to obtain global contextual information and strengthen the effective fusion of local contextual information at different scales. Experimental results on Cityscapes validation set and CamVid test set show that HLFGNet has the mean Intersection over Union (mIoU) of 76.67% and 70.90% respectively, the frames per second reached 75.0 and 96.2 respectively. In comparison with BiSeNet, HLFGNet has the mIoU increased by 1.76 and 3.40 percentage points respectively. It can be seen that HLFGNet can accurately identify the scene information and meet the real-time requirements.

Key words: real-time semantic segmentation, upsampling, attention mechanism, pyramid pooling, contextual information

中图分类号: