Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 3058-3066.DOI: 10.11772/j.issn.1001-9081.2023101424

• Artificial intelligence • Previous Articles     Next Articles

Dual-branch real-time semantic segmentation network based on detail enhancement

Qiumei ZHENG, Weiwei NIU(), Fenghua WANG, Dan ZHAO   

  1. College of Computer Science and Technology,China University of Petroleum (East China),Qingdao Shandong 266580,China
  • Received:2023-10-23 Revised:2024-02-28 Accepted:2024-03-08 Online:2024-10-15 Published:2024-10-10
  • Contact: Weiwei NIU
  • About author:ZHENG Qiumei, born in 1964, professor. Her research interests include computer vision, image processing, digital watermarking.
    WANG Fenghua, born in 1979, Ph. D., lecturer. His research interests include computer vision, embedded software.
    ZHAO Dan, born in 1998, M. S. Her research interest includes digital image watermarking.
  • Supported by:
    National Natural Science Foundation of China(52074341);Fundamental Research Funds for Central Universities(19CX02030A)

基于细节增强的双分支实时语义分割网络

郑秋梅, 牛薇薇(), 王风华, 赵丹   

  1. 中国石油大学(华东) 计算机科学与技术学院,山东 青岛 266580
  • 通讯作者: 牛薇薇
  • 作者简介:郑秋梅(1964—),女,山东东营人,教授,主要研究方向:计算机视觉、图像处理、数字水印
    牛薇薇(1998—),女,河北廊坊人,硕士,主要研究方向:计算机视觉、语义分割 z21070261@s.upc.edu.cn
    王风华(1979—),男,山东泰安人,讲师,博士,主要研究方向:计算机视觉、嵌入式软件
    赵丹(1998—),女,河南商丘人,硕士,主要研究方向:数字图像水印。
  • 基金资助:
    国家自然科学基金资助项目(52074341);中央高校基本科研业务费专项资金资助项目(19CX02030A)

Abstract:

Real-time semantic segmentation methods often use dual-branch structures to store shallow spatial information and deep semantic information of images respectively. However, current real-time semantic segmentation methods based on dual-branch structure focus on mining semantic features and ignore the maintenance of spatial features, which make the network unable to accurately capture detailed features such as boundaries and textures of objects in the image, and the final segmentation effect not good. To solve the above problems, a Dual-Branch real-time semantic segmentation Network based on Detail Enhancement (DEDBNet) was proposed to enhance spatial detail information in multiple stages. First, a Detail-Enhanced Bidirectional Interaction Module (DEBIM) was proposed. In the interaction stage between branches, a lightweight spatial attention mechanism was used to enhance the ability of high-resolution feature maps to express detailed information, and promote the flow of spatial detail features on the high and low branches, improving the network’s ability to learn detailed information. Second, a Local Detail Attention Feature Fusion (LDAFF) module was designed to model the global semantic information and local spatial information at the same time in the process of feature fusion at the ends of the two branches, so as to solve the problem of discontinuity of details between feature maps at different levels. In addition, boundary loss was introduced to guide the learning of object boundary information by the network shallow layers without affecting the speed of the model. The proposed network achieved a mean Intersection over Union (mIoU) of 78.2% on the Cityscapes validation set at a speed of 92.3 frame/s, and an mIoU of 79.2% on the CamVid test set at a speed of 202.8 frame/s; compared with Deep Dual Resolution Network (DDRNet-23-slim), the mIoU of the proposed network increased by 1.1 and 4.5 percentage points respectively. The experimental results show that DEDBNet can accurately segment scene images and meet real-time requirements.

Key words: real-time semantic segmentation, dual-branch, detail enhancement, feature fusion, attention mechanism

摘要:

实时语义分割方法常利用双分支结构分别保存图像的浅层空间信息和深层语义信息。然而,当前基于双分支结构的实时语义分割方法重点研究语义特征的挖掘,忽略了空间特征的保持,导致网络无法精准地捕捉图像内物体的边界和纹理等细节特征,最终分割效果欠佳。针对以上问题,提出基于细节增强的双分支实时语义分割网络(DEDBNet),多阶段增强空间细节信息。首先,提出细节增强双向交互(DEBIM)模块,在分支间的交互阶段使用轻量空间注意力机制增强高分辨率特征图对细节信息的表达能力,促进空间细节特征在高低两分支上的流动,以加强网络对细节信息的学习能力;其次,设计局部细节注意力特征融合模块(LDAFF),在两分支末端特征融合的过程中同时建模全局语义信息和局部空间信息,解决不同层次特征图之间细节不连续的问题;此外,引入边界损失,在不影响模型速度的情况下引导网络浅层学习物体边界信息。所提网络在Cityscapes验证集上以92.3 frame/s的帧速率(FPS)获得78.2%的平均交并比(mIoU),在CamVid测试集上以202.8 frame/s获得79.2%的mIoU;与深度双分辨率网络(DDRNet-23-slim)相比,mIoU分别提高了1.1和4.5个百分点。实验结果表明,DEDBNet能够准确地分割场景图像,且满足实时性要求。

关键词: 实时语义分割, 双分支, 细节增强, 特征融合, 注意力机制

CLC Number: