Dual-branch real-time semantic segmentation network based on detail enhancement

doi:10.11772/j.issn.1001-9081.2023101424

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (10): 3058-3066.DOI: 10.11772/j.issn.1001-9081.2023101424

• Artificial intelligence • Previous Articles Next Articles

Dual-branch real-time semantic segmentation network based on detail enhancement

Qiumei ZHENG, Weiwei NIU(), Fenghua WANG, Dan ZHAO

College of Computer Science and Technology，China University of Petroleum （East China），Qingdao Shandong 266580，China

Received:2023-10-23 Revised:2024-02-28 Accepted:2024-03-08 Online:2024-10-15 Published:2024-10-10
Contact: Weiwei NIU
About author:ZHENG Qiumei， born in 1964， professor. Her research interests include computer vision， image processing， digital watermarking.
WANG Fenghua， born in 1979， Ph. D.， lecturer. His research interests include computer vision， embedded software.
ZHAO Dan， born in 1998， M. S. Her research interest includes digital image watermarking.
Supported by:
National Natural Science Foundation of China(52074341);Fundamental Research Funds for Central Universities(19CX02030A)

基于细节增强的双分支实时语义分割网络

郑秋梅, 牛薇薇(), 王风华, 赵丹

中国石油大学（华东）计算机科学与技术学院，山东青岛 266580

通讯作者: 牛薇薇
作者简介:郑秋梅（1964—），女，山东东营人，教授，主要研究方向：计算机视觉、图像处理、数字水印
牛薇薇（1998—），女，河北廊坊人，硕士，主要研究方向：计算机视觉、语义分割 z21070261@s.upc.edu.cn
王风华（1979—），男，山东泰安人，讲师，博士，主要研究方向：计算机视觉、嵌入式软件
赵丹（1998—），女，河南商丘人，硕士，主要研究方向：数字图像水印。
基金资助:
国家自然科学基金资助项目(52074341);中央高校基本科研业务费专项资金资助项目(19CX02030A)

Abstract

Abstract:

Real-time semantic segmentation methods often use dual-branch structures to store shallow spatial information and deep semantic information of images respectively. However， current real-time semantic segmentation methods based on dual-branch structure focus on mining semantic features and ignore the maintenance of spatial features， which make the network unable to accurately capture detailed features such as boundaries and textures of objects in the image， and the final segmentation effect not good. To solve the above problems， a Dual-Branch real-time semantic segmentation Network based on Detail Enhancement （DEDBNet） was proposed to enhance spatial detail information in multiple stages. First， a Detail-Enhanced Bidirectional Interaction Module （DEBIM） was proposed. In the interaction stage between branches， a lightweight spatial attention mechanism was used to enhance the ability of high-resolution feature maps to express detailed information， and promote the flow of spatial detail features on the high and low branches， improving the network’s ability to learn detailed information. Second， a Local Detail Attention Feature Fusion （LDAFF） module was designed to model the global semantic information and local spatial information at the same time in the process of feature fusion at the ends of the two branches， so as to solve the problem of discontinuity of details between feature maps at different levels. In addition， boundary loss was introduced to guide the learning of object boundary information by the network shallow layers without affecting the speed of the model. The proposed network achieved a mean Intersection over Union （mIoU） of 78.2% on the Cityscapes validation set at a speed of 92.3 frame/s， and an mIoU of 79.2% on the CamVid test set at a speed of 202.8 frame/s； compared with Deep Dual Resolution Network （DDRNet-23-slim）， the mIoU of the proposed network increased by 1.1 and 4.5 percentage points respectively. The experimental results show that DEDBNet can accurately segment scene images and meet real-time requirements.

Key words: real-time semantic segmentation, dual-branch, detail enhancement, feature fusion, attention mechanism

摘要：

实时语义分割方法常利用双分支结构分别保存图像的浅层空间信息和深层语义信息。然而，当前基于双分支结构的实时语义分割方法重点研究语义特征的挖掘，忽略了空间特征的保持，导致网络无法精准地捕捉图像内物体的边界和纹理等细节特征，最终分割效果欠佳。针对以上问题，提出基于细节增强的双分支实时语义分割网络（DEDBNet），多阶段增强空间细节信息。首先，提出细节增强双向交互（DEBIM）模块，在分支间的交互阶段使用轻量空间注意力机制增强高分辨率特征图对细节信息的表达能力，促进空间细节特征在高低两分支上的流动，以加强网络对细节信息的学习能力；其次，设计局部细节注意力特征融合模块（LDAFF），在两分支末端特征融合的过程中同时建模全局语义信息和局部空间信息，解决不同层次特征图之间细节不连续的问题；此外，引入边界损失，在不影响模型速度的情况下引导网络浅层学习物体边界信息。所提网络在Cityscapes验证集上以92.3 frame/s的帧速率（FPS）获得78.2%的平均交并比（mIoU），在CamVid测试集上以202.8 frame/s获得79.2%的mIoU；与深度双分辨率网络（DDRNet-23-slim）相比，mIoU分别提高了1.1和4.5个百分点。实验结果表明，DEDBNet能够准确地分割场景图像，且满足实时性要求。

关键词: 实时语义分割, 双分支, 细节增强, 特征融合, 注意力机制

CLC Number:

TP391.4

Qiumei ZHENG, Weiwei NIU, Fenghua WANG, Dan ZHAO. Dual-branch real-time semantic segmentation network based on detail enhancement[J]. Journal of Computer Applications, 2024, 44(10): 3058-3066.

郑秋梅, 牛薇薇, 王风华, 赵丹. 基于细节增强的双分支实时语义分割网络[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3058-3066.

Figures/Tables 13

References 44

1	LIU Z， LI X， LUO P， et al. Deep learning Markov random field for semantic segmentation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（8）： 1814-1828.
2	JING L， CHEN Y， TIAN Y. Coarse-to-fine semantic segmentation from image-level labels ［J］. IEEE Transactions on Image Processing， 2020， 29： 225-236.
3	REN X， AHMAD S， ZHANG L， et al. Task decomposition and synchronization for semantic biomedical image segmentation ［J］. IEEE Transactions on Image Processing， 2020， 29： 7497-7510.
4	SAHA M， CHAKRABORTY C. Her2Net： a deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation ［J］. IEEE Transactions on Image Processing， 2018， 27（5）： 2189-2200.
5	ROMERA E， ÁLVAREZ J M， BERGASA L M， et al. ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation ［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（1）： 263-272.
6	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440.
7	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2023-04-10］. .
8	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
9	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. （2017-06-17）［2023-04-10］. .
10	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 833-851.
11	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham： Springer， 2015： 234-241.
12	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239.
13	WANG X， GIRSHICK R， GUPTA A， et al. Non-local neural networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7794-7803.
14	LIN G， MILAN A， SHEN C， et al. RefineNet： multi-path refinement networks for high-resolution semantic segmentation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5168-5177.
15	WANG J， SUN K， CHENG T， et al. Deep high-resolution representation learning for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2021， 43（10）： 3349-3364.
16	PASZKE A， CHAURASIA A， KIM S， et al. ENet： a deep neural network architecture for real-time semantic segmentation ［EB/OL］. （2016-06-07）［2023-04-10］. .
17	文凯，唐伟伟，熊俊臣.基于注意力机制和有效分解卷积的实时分割算法［J］.计算机应用，2022，42（9）：2659-2666.
	WEN K， TANG W W， XIONG J C. Real-time segmentation algorithm based on attention mechanism and effective factorized convolution［J］. Journal of Computer Applications， 2022， 42（9）： 2659-2666.
18	YU C， WANG J， PENG C， et al. BiSeNet： bilateral segmentation network for real-time semantic segmentation ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 334-349.
19	ZHAO H， QI X， SHEN X， et al. ICNet for real-time semantic segmentation on high-resolution images ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 418-434.
20	POUDEL P P K， LIWICKI S， CIPOLLA R. Fast-SCNN： fast semantic segmentation network ［EB/OL］. （2019-02-12）［2023-04-15］. .
21	LI H， XIONG P， FAN H， et al. DFANet： deep feature aggregation for real-time semantic segmentation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9514-9523.
22	PAN H， HONG Y， SUN W， et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes ［J］. IEEE Transactions on Intelligent Transportation Systems， 2023， 24（3）： 3448-3460.
23	虞资兴，瞿绍军，何鑫，等. 高低维特征引导的实时语义分割网络［J］. 计算机应用， 2023， 43（10）： 3077-3085.
	YU Z X， QU S J， HE X， et al. High-low dimensional feature guided real-time semantic segmentation network ［J］. Journal of Computer Applications， 2023， 43（10）： 3077-3085.
24	SI H， ZHANG Z， LV F， et al. Real-time semantic segmentation via multiply spatial fusion network ［C］// Proceedings of the 2020 British Machine Vision Virtual Conference. Durham： British Machine Vision Association， 2020： 0678-0689.
25	HAO S， ZHOU Y， GUO Y， et al. Real-time semantic segmentation via spatial-detail guided context propagation ［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022（Early Access）： 1-12.
26	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223.
27	BROSTOW G J， FAUQUEUR J， CIPOLLA R. Semantic object classes in video： a high-definition ground truth database ［J］. Pattern Recognition Letters， 2009， 30（2）： 88-97.
28	YU C， GAO C， WANG J， et al. BiSeNet V2： bilateral network with guided aggregation for real-time semantic segmentation ［J］. International Journal of Computer Vision， 2021， 129（11）： 3051-3068.
29	FAN M， LAI S， HUANG J， et al. Rethinking BiSeNet for real-time semantic segmentation ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9711-9720.
30	FU J， LIU J， TIAN H， et al. Dual attention network for scene segmentation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3141-3149.
31	MA H， YANG H， HUANG D. Boundary guided context aggregation for semantic segmentation ［C］// Proceedings of the 2021 British Machine Vision Virtual Conference. Durham： British Machine Vision Association， 2021： 0091-0103.
32	霍占强，贾海洋，乔应旭，等.边界感知的实时语义分割网络［J］.计算机工程与应用，2022，58（17）：165-173.
	HUO Z Q， JIA H Y， QIAO Y X， et al. Boundary-aware real-time semantic segmentation network ［J］. Computer Engineering and Applications， 2022， 58（17）： 165-173.
33	XU J， XIONG Z， BHATTACHARYYA S P. PIDNet： a real-time semantic segmentation network inspired by PID controllers ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 19529-19539.
34	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19.
35	WU Y， JIANG J， HUANG Z， et al. FPANet： feature pyramid aggregation network for real-time semantic segmentation［J］. Applied Intelligence， 2022， 52（3）： 3319-3336.
36	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
37	HUANG Z， WEI Y， WANG X， et al. AlignSeg： feature-aligned segmentation networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（1）： 550-557.
38	LI X， ZHAO H， HAN L， et al. GFF： gated fully fusion for semantic segmentation ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park： AAAI Press， 2020： 11418-11425.
39	HOCHREITER S， SCHMIDHUBER J. Long short-term memory ［J］. Neural Computation， 1997， 9（8）： 1735-1780.
40	ORŠIC M， KREŠO I， BEVANDIC P， et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 12599-12608.
41	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252.
42	SHRIVASTAVA A， GUPTA A， GIRSHICK R. Training region-based object detectors with online hard example mining ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 761-769.
43	PENG J， LIU Y， TANG S， et al. PP-LiteSeg： a superior real-time semantic segmentation model ［EB/OL］. （2022-04-06）［2023-10-24］. .
44	WANG J， GOU C， WU Q， et al. RTFormer： efficient design for real-time semantic segmentation with Transformer ［C］// Proceedings of the 2022 International Conference on Neural Information Processing Systems. Red Hook： Curran Associates， 2022：7423-7436.

基线网络	DEBIM	LDAFF	边界损失	mIoU/%	参数量/MB	帧速率/ （frame·s^-1）
√				77.1	5.71	101.6
√	√			77.6	5.71	100.3
√	√	√		77.8	5.73	92.3
√	√	√	√	78.2	5.73	92.3

基线网络	DEBIM	LDAFF	边界损失	mIoU/%	参数量/MB	帧速率/ （frame·s^-1）
√				77.1	5.71	101.6
√	√			77.6	5.71	100.3
√	√	√		77.8	5.73	92.3
√	√	√	√	78.2	5.73	92.3

ADD	FFM	LDAFF	mIoU/%	帧速率/（frame·s^-1）
√			77.6	100.32
	√		77.6	81.40
		√	78.2	92.30

ADD	FFM	LDAFF	mIoU/%	帧速率/（frame·s^-1）
√			77.6	100.32
	√		77.6	81.40
		√	78.2	92.30

β	mIoU/%	β	mIoU/%
0.10	77.9	0.50	77.9
0.20	78.1	0.75	77.8
0.25	78.2	1.00	77.6

Dual-branch real-time semantic segmentation network based on detail enhancement

基于细节增强的双分支实时语义分割网络

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 44

Related Articles 15

Recommended Articles

Metrics

方法	输入图像尺寸	参数量/MB	GPU	mIoU/%		帧速率/（frame·s^-1）
方法	输入图像尺寸	参数量/MB	GPU	验证集	测试集	帧速率/（frame·s^-1）
ICNet^［19］	2 048×1 024	26.50	TitanX M	—	69.5	30.0
BiSeNet^［18］	1 536×768	5.80	GTX 1080Ti	69.0	68.4	105.8
BiSeNetV2^［28］	1 024×512	—	GTX 1080Ti	73.4	72.6	156.0
STDC1-Seg75^［29］	1 536×768	14.20	GTX 2080Ti	74.5	75.3	74.6
STDC2-Seg75^［29］	1 536×768	22.20	GTX 2080Ti	77.0	76.8	73.5
PP-LiteSeg-T2^［43］	1 536×768	—	GTX 2080Ti	76.0	74.9	91.5
PP-LiteSeg-B2^［43］	1 536×768	—	GTX 2080Ti	77.8	77.1	79.1
HLFGNet^［23］	2 048×1 024	50.53	GTX 2080Ti	76.6	75.4	75.0
MSFNet^［24］	2 048×1 024	—	GTX 2080Ti	—	77.1	41.0
SGCPNet^［25］	2 048×1 024	0.61	GTX 2080Ti	—	70.9	106.5
DDRNet-23-slim^［22］	2 048×1 024	5.71	GTX 2080Ti	77.1	77.4	101.6
RTFormer-slim^［44］	2 048×1 024	4.80	GTX 2080Ti	76.1	75.4	89.6
DEDBNet	2 048×1 024	5.73	GTX 2080Ti	78.2	77.8	92.3

类别	BiSeNet^［18］	DDRNet-23-slim^［22］	DEDBNet
mIoU	74.5	77.1	78.2
road	98.2	98.1	98.2
sidewalk	83.2	84.4	85.4
building	91.6	92.1	92.5
wall	45.0	56.8	58.1
fence	50.7	60.2	61.9
pole	62.0	62.7	63.6
traffic light	71.3	68.7	69.5
traffic sign	74.6	76.6	76.6
vegetation	92.8	92.1	92.3
terrain	70.4	66.7	64.9
sky	94.9	94.6	94.6
person	83.4	80.8	80.6
rider	66.2	62.1	59.4
car	94.9	94.8	94.9
truck	61.4	80.3	83.3
bus	75.5	85.7	89.5
train	67.0	78.8	80.8
motorcycle	61.2	53.8	61.9
bicycle	72.3	74.6	75.7

方法	GPU	mIoU/%	帧速率/（frame·s^-1）
ICNet^［19］	TitanX	67.1	27.8
BiSeNet1^［18］	GTX 1080Ti	65.6	175.0
BiSeNet2^［18］	GTX 1080Ti	68.7	116.3
BiSeNetV2^［28］	GTX 1080Ti	72.4	124.5
BiSeNetV2-L^［28］	GTX 1080Ti	73.2	32.7
STDC1-Seg^［29］	RTX 2080Ti	73.0	125.6
STDC2-Seg^［29］	RTX 2080Ti	73.9	100.5
HLFGNet^［23］	RTX 2080Ti	70.9	96.2
MSFNet^［24］	RTX 2080Ti	75.4	91.0
SGCPNet^［25］	RTX 2080Ti	69.0	278.4
DDRNet-23-slim^［22］	RTX 2080Ti	74.7	217.0
DEDBNet	RTX 2080Ti	79.2	202.8

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[10]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[11]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[12]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[13]	Ruihua LIU, Zihe HAO, Yangyang ZOU. Gait recognition algorithm based on multi-layer refined feature fusion [J]. Journal of Computer Applications, 2024, 44(7): 2250-2257.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.