基于细节增强的双分支实时语义分割网络

doi:10.11772/j.issn.1001-9081.2023101424

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (10): 3058-3066.DOI: 10.11772/j.issn.1001-9081.2023101424

基于细节增强的双分支实时语义分割网络

郑秋梅, 牛薇薇(), 王风华, 赵丹

中国石油大学（华东）计算机科学与技术学院，山东青岛 266580

收稿日期:2023-10-23 修回日期:2024-02-28 接受日期:2024-03-08 发布日期:2024-10-15 出版日期:2024-10-10
通讯作者: 牛薇薇
作者简介:郑秋梅（1964—），女，山东东营人，教授，主要研究方向：计算机视觉、图像处理、数字水印
牛薇薇（1998—），女，河北廊坊人，硕士，主要研究方向：计算机视觉、语义分割 z21070261@s.upc.edu.cn
王风华（1979—），男，山东泰安人，讲师，博士，主要研究方向：计算机视觉、嵌入式软件
赵丹（1998—），女，河南商丘人，硕士，主要研究方向：数字图像水印。
基金资助:
国家自然科学基金资助项目(52074341);中央高校基本科研业务费专项资金资助项目(19CX02030A)

Dual-branch real-time semantic segmentation network based on detail enhancement

Qiumei ZHENG, Weiwei NIU(), Fenghua WANG, Dan ZHAO

College of Computer Science and Technology，China University of Petroleum （East China），Qingdao Shandong 266580，China

Received:2023-10-23 Revised:2024-02-28 Accepted:2024-03-08 Online:2024-10-15 Published:2024-10-10
Contact: Weiwei NIU
About author:ZHENG Qiumei， born in 1964， professor. Her research interests include computer vision， image processing， digital watermarking.
WANG Fenghua， born in 1979， Ph. D.， lecturer. His research interests include computer vision， embedded software.
ZHAO Dan， born in 1998， M. S. Her research interest includes digital image watermarking.
Supported by:
National Natural Science Foundation of China(52074341);Fundamental Research Funds for Central Universities(19CX02030A)

摘要/Abstract

摘要：

实时语义分割方法常利用双分支结构分别保存图像的浅层空间信息和深层语义信息。然而，当前基于双分支结构的实时语义分割方法重点研究语义特征的挖掘，忽略了空间特征的保持，导致网络无法精准地捕捉图像内物体的边界和纹理等细节特征，最终分割效果欠佳。针对以上问题，提出基于细节增强的双分支实时语义分割网络（DEDBNet），多阶段增强空间细节信息。首先，提出细节增强双向交互（DEBIM）模块，在分支间的交互阶段使用轻量空间注意力机制增强高分辨率特征图对细节信息的表达能力，促进空间细节特征在高低两分支上的流动，以加强网络对细节信息的学习能力；其次，设计局部细节注意力特征融合模块（LDAFF），在两分支末端特征融合的过程中同时建模全局语义信息和局部空间信息，解决不同层次特征图之间细节不连续的问题；此外，引入边界损失，在不影响模型速度的情况下引导网络浅层学习物体边界信息。所提网络在Cityscapes验证集上以92.3 frame/s的帧速率（FPS）获得78.2%的平均交并比（mIoU），在CamVid测试集上以202.8 frame/s获得79.2%的mIoU；与深度双分辨率网络（DDRNet-23-slim）相比，mIoU分别提高了1.1和4.5个百分点。实验结果表明，DEDBNet能够准确地分割场景图像，且满足实时性要求。

关键词: 实时语义分割, 双分支, 细节增强, 特征融合, 注意力机制

Abstract:

Real-time semantic segmentation methods often use dual-branch structures to store shallow spatial information and deep semantic information of images respectively. However， current real-time semantic segmentation methods based on dual-branch structure focus on mining semantic features and ignore the maintenance of spatial features， which make the network unable to accurately capture detailed features such as boundaries and textures of objects in the image， and the final segmentation effect not good. To solve the above problems， a Dual-Branch real-time semantic segmentation Network based on Detail Enhancement （DEDBNet） was proposed to enhance spatial detail information in multiple stages. First， a Detail-Enhanced Bidirectional Interaction Module （DEBIM） was proposed. In the interaction stage between branches， a lightweight spatial attention mechanism was used to enhance the ability of high-resolution feature maps to express detailed information， and promote the flow of spatial detail features on the high and low branches， improving the network’s ability to learn detailed information. Second， a Local Detail Attention Feature Fusion （LDAFF） module was designed to model the global semantic information and local spatial information at the same time in the process of feature fusion at the ends of the two branches， so as to solve the problem of discontinuity of details between feature maps at different levels. In addition， boundary loss was introduced to guide the learning of object boundary information by the network shallow layers without affecting the speed of the model. The proposed network achieved a mean Intersection over Union （mIoU） of 78.2% on the Cityscapes validation set at a speed of 92.3 frame/s， and an mIoU of 79.2% on the CamVid test set at a speed of 202.8 frame/s； compared with Deep Dual Resolution Network （DDRNet-23-slim）， the mIoU of the proposed network increased by 1.1 and 4.5 percentage points respectively. The experimental results show that DEDBNet can accurately segment scene images and meet real-time requirements.

Key words: real-time semantic segmentation, dual-branch, detail enhancement, feature fusion, attention mechanism

中图分类号:

TP391.4

郑秋梅, 牛薇薇, 王风华, 赵丹. 基于细节增强的双分支实时语义分割网络[J]. 计算机应用, 2024, 44(10): 3058-3066.

Qiumei ZHENG, Weiwei NIU, Fenghua WANG, Dan ZHAO. Dual-branch real-time semantic segmentation network based on detail enhancement[J]. Journal of Computer Applications, 2024, 44(10): 3058-3066.

图/表 13

参考文献 44

1	LIU Z， LI X， LUO P， et al. Deep learning Markov random field for semantic segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（8）： 1814-1828.
2	JING L， CHEN Y， TIAN Y. Coarse-to-fine semantic segmentation from image-level labels ［J］. IEEE Transactions on Image Processing， 2020， 29： 225-236.
3	REN X， AHMAD S， ZHANG L， et al. Task decomposition and synchronization for semantic biomedical image segmentation ［J］. IEEE Transactions on Image Processing， 2020， 29： 7497-7510.
4	SAHA M， CHAKRABORTY C. Her2Net： a deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation ［J］. IEEE Transactions on Image Processing， 2018， 27（5）： 2189-2200.
5	ROMERA E， ÁLVAREZ J M， BERGASA L M， et al. ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation ［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（1）： 263-272.
6	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440.
7	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2023-04-10］. .
8	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
9	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. （2017-06-17）［2023-04-10］. .
10	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 833-851.
11	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham： Springer， 2015： 234-241.
12	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239.
13	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803.
14	LIN G， MILAN A， SHEN C， et al. RefineNet： multi-path refinement networks for high-resolution semantic segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5168-5177.
15	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
16	PASZKE A， CHAURASIA A， KIM S， et al. ENet： a deep neural network architecture for real-time semantic segmentation ［EB/OL］. （2016-06-07）［2023-04-10］. .
17	文凯，唐伟伟，熊俊臣.基于注意力机制和有效分解卷积的实时分割算法［J］.计算机应用，2022，42（9）：2659-2666.
	WEN K， TANG W W， XIONG J C. Real-time segmentation algorithm based on attention mechanism and effective factorized convolution［J］. Journal of Computer Applications， 2022， 42（9）： 2659-2666.
18	YU C， WANG J， PENG C， et al. BiSeNet： bilateral segmentation network for real-time semantic segmentation ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 334-349.
19	ZHAO H， QI X， SHEN X， et al. ICNet for real-time semantic segmentation on high-resolution images ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 418-434.
20	POUDEL P P K， LIWICKI S， CIPOLLA R. Fast-SCNN： fast semantic segmentation network ［EB/OL］. （2019-02-12）［2023-04-15］. .
21	LI H， XIONG P， FAN H， et al. DFANet： deep feature aggregation for real-time semantic segmentation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9514-9523.
22	PAN H， HONG Y， SUN W， et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes ［J］. IEEE Transactions on Intelligent Transportation Systems， 2023， 24（3）： 3448-3460.
23	虞资兴，瞿绍军，何鑫，等. 高低维特征引导的实时语义分割网络［J］. 计算机应用， 2023， 43（10）： 3077-3085.
	YU Z X， QU S J， HE X， et al. High-low dimensional feature guided real-time semantic segmentation network ［J］. Journal of Computer Applications， 2023， 43（10）： 3077-3085.
24	SI H， ZHANG Z， LV F， et al. Real-time semantic segmentation via multiply spatial fusion network ［C］// Proceedings of the 2020 British Machine Vision Virtual Conference. Durham： British Machine Vision Association， 2020： 0678-0689.
25	HAO S， ZHOU Y， GUO Y， et al. Real-time semantic segmentation via spatial-detail guided context propagation ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022（Early Access）： 1-12.
26	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223.
27	BROSTOW G J， FAUQUEUR J， CIPOLLA R. Semantic object classes in video： a high-definition ground truth database ［J］. Pattern Recognition Letters， 2009， 30（2）： 88-97.
28	YU C， GAO C， WANG J， et al. BiSeNet V2： bilateral network with guided aggregation for real-time semantic segmentation ［J］. International Journal of Computer Vision， 2021， 129（11）： 3051-3068.
29	FAN M， LAI S， HUANG J， et al. Rethinking BiSeNet for real-time semantic segmentation ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9711-9720.
30	FU J， LIU J， TIAN H， et al. Dual attention network for scene segmentation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3141-3149.
31	MA H， YANG H， HUANG D. Boundary guided context aggregation for semantic segmentation ［C］// Proceedings of the 2021 British Machine Vision Virtual Conference. Durham： British Machine Vision Association， 2021： 0091-0103.
32	霍占强，贾海洋，乔应旭，等.边界感知的实时语义分割网络［J］.计算机工程与应用，2022，58（17）：165-173.
	HUO Z Q， JIA H Y， QIAO Y X， et al. Boundary-aware real-time semantic segmentation network ［J］. Computer Engineering and Applications， 2022， 58（17）： 165-173.
33	XU J， XIONG Z， BHATTACHARYYA S P. PIDNet： a real-time semantic segmentation network inspired by PID controllers ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 19529-19539.
34	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19.
35	WU Y， JIANG J， HUANG Z， et al. FPANet： feature pyramid aggregation network for real-time semantic segmentation［J］. Applied Intelligence， 2022， 52（3）： 3319-3336.
36	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
37	HUANG Z， WEI Y， WANG X， et al. AlignSeg： feature-aligned segmentation networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（1）： 550-557.
38	LI X， ZHAO H， HAN L， et al. GFF： gated fully fusion for semantic segmentation ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park： AAAI Press， 2020： 11418-11425.
39	HOCHREITER S， SCHMIDHUBER J. Long short-term memory ［J］. Neural Computation， 1997， 9（8）： 1735-1780.
40	ORŠIC M， KREŠO I， BEVANDIC P， et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 12599-12608.
41	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252.
42	SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 761-769.
43	PENG J， LIU Y， TANG S， et al. PP-LiteSeg： a superior real-time semantic segmentation model ［EB/OL］. （2022-04-06）［2023-10-24］. .
44	WANG J， GOU C， WU Q， et al. RTFormer： efficient design for real-time semantic segmentation with Transformer ［C］// Proceedings of the 2022 International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2022：7423-7436.

基线网络	DEBIM	LDAFF	边界损失	mIoU/%	参数量/MB	帧速率/ （frame·s^-1）
√				77.1	5.71	101.6
√	√			77.6	5.71	100.3
√	√	√		77.8	5.73	92.3
√	√	√	√	78.2	5.73	92.3

基线网络	DEBIM	LDAFF	边界损失	mIoU/%	参数量/MB	帧速率/ （frame·s^-1）
√				77.1	5.71	101.6
√	√			77.6	5.71	100.3
√	√	√		77.8	5.73	92.3
√	√	√	√	78.2	5.73	92.3

ADD	FFM	LDAFF	mIoU/%	帧速率/（frame·s^-1）
√			77.6	100.32
	√		77.6	81.40
		√	78.2	92.30

ADD	FFM	LDAFF	mIoU/%	帧速率/（frame·s^-1）
√			77.6	100.32
	√		77.6	81.40
		√	78.2	92.30

β	mIoU/%	β	mIoU/%
0.10	77.9	0.50	77.9
0.20	78.1	0.75	77.8
0.25	78.2	1.00	77.6

基于细节增强的双分支实时语义分割网络

Dual-branch real-time semantic segmentation network based on detail enhancement

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 44

相关文章 15

编辑推荐

Metrics

方法	输入图像尺寸	参数量/MB	GPU	mIoU/%		帧速率/（frame·s^-1）
方法	输入图像尺寸	参数量/MB	GPU	验证集	测试集	帧速率/（frame·s^-1）
ICNet^［19］	2 048×1 024	26.50	TitanX M	—	69.5	30.0
BiSeNet^［18］	1 536×768	5.80	GTX 1080Ti	69.0	68.4	105.8
BiSeNetV2^［28］	1 024×512	—	GTX 1080Ti	73.4	72.6	156.0
STDC1-Seg75^［29］	1 536×768	14.20	GTX 2080Ti	74.5	75.3	74.6
STDC2-Seg75^［29］	1 536×768	22.20	GTX 2080Ti	77.0	76.8	73.5
PP-LiteSeg-T2^［43］	1 536×768	—	GTX 2080Ti	76.0	74.9	91.5
PP-LiteSeg-B2^［43］	1 536×768	—	GTX 2080Ti	77.8	77.1	79.1
HLFGNet^［23］	2 048×1 024	50.53	GTX 2080Ti	76.6	75.4	75.0
MSFNet^［24］	2 048×1 024	—	GTX 2080Ti	—	77.1	41.0
SGCPNet^［25］	2 048×1 024	0.61	GTX 2080Ti	—	70.9	106.5
DDRNet-23-slim^［22］	2 048×1 024	5.71	GTX 2080Ti	77.1	77.4	101.6
RTFormer-slim^［44］	2 048×1 024	4.80	GTX 2080Ti	76.1	75.4	89.6
DEDBNet	2 048×1 024	5.73	GTX 2080Ti	78.2	77.8	92.3

类别	BiSeNet^［18］	DDRNet-23-slim^［22］	DEDBNet
mIoU	74.5	77.1	78.2
road	98.2	98.1	98.2
sidewalk	83.2	84.4	85.4
building	91.6	92.1	92.5
wall	45.0	56.8	58.1
fence	50.7	60.2	61.9
pole	62.0	62.7	63.6
traffic light	71.3	68.7	69.5
traffic sign	74.6	76.6	76.6
vegetation	92.8	92.1	92.3
terrain	70.4	66.7	64.9
sky	94.9	94.6	94.6
person	83.4	80.8	80.6
rider	66.2	62.1	59.4
car	94.9	94.8	94.9
truck	61.4	80.3	83.3
bus	75.5	85.7	89.5
train	67.0	78.8	80.8
motorcycle	61.2	53.8	61.9
bicycle	72.3	74.6	75.7

方法	GPU	mIoU/%	帧速率/（frame·s^-1）
ICNet^［19］	TitanX	67.1	27.8
BiSeNet1^［18］	GTX 1080Ti	65.6	175.0
BiSeNet2^［18］	GTX 1080Ti	68.7	116.3
BiSeNetV2^［28］	GTX 1080Ti	72.4	124.5
BiSeNetV2-L^［28］	GTX 1080Ti	73.2	32.7
STDC1-Seg^［29］	RTX 2080Ti	73.0	125.6
STDC2-Seg^［29］	RTX 2080Ti	73.9	100.5
HLFGNet^［23］	RTX 2080Ti	70.9	96.2
MSFNet^［24］	RTX 2080Ti	75.4	91.0
SGCPNet^［25］	RTX 2080Ti	69.0	278.4
DDRNet-23-slim^［22］	RTX 2080Ti	74.7	217.0
DEDBNet	RTX 2080Ti	79.2	202.8

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[4]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[5]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[6]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[7]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[8]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[9]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[10]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[11]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[12]	毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025.
[13]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[14]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[15]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.