融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法

doi:10.11772/j.issn.1001-9081.2022020210

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 936-942.DOI: 10.11772/j.issn.1001-9081.2022020210

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法

何雪东¹, 宣士斌¹^,²(), 王款¹, 陈梦楠¹

^1.广西民族大学人工智能学院，南宁 530006
^2.广西混杂计算与集成电路设计分析重点实验室（广西民族大学），南宁 530006

收稿日期:2022-02-24 修回日期:2022-05-25 接受日期:2022-05-25 发布日期:2022-08-16 出版日期:2023-03-10
通讯作者: 宣士斌
作者简介:何雪东（1997—），男，吉林松原人，硕士研究生，CCF会员，主要研究方向：语义分割、计算机视觉
宣士斌（1964—），男，安徽无为人，教授，博士，主要研究方向：图像处理与识别
王款（1995—），男，江苏海安人，硕士研究生，主要研究方向：姿态估计、深度学习
陈梦楠（1997—），男，山西长治人，硕士研究生，主要研究方向：算法优化、计算智能。
基金资助:
国家自然科学基金资助项目(61866003)

DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism

Xuedong HE¹, Shibin XUAN¹^,²(), Kuan WANG¹, Mengnan CHEN¹

^1.School of Artificial Intelligence，Guangxi Minzu University，Nanning Guangxi 530006，China
^2.Guangxi Key Laboratory of Hybrid Computation and IC Design and Analysis （Guangxi Minzu University），Nanning Guangxi 530006，China

Received:2022-02-24 Revised:2022-05-25 Accepted:2022-05-25 Online:2022-08-16 Published:2023-03-10
Contact: Shibin XUAN
About author:HE Xuedong， born in 1997， M. S. candidate. His research interests include semantic segmentation， computer vision.
WANG Kuan， born in 1995， M. S. candidate. His research interests include pose estimation， deep learning.
CHEN Mengnan， born in 1997， M. S. candidate. His research interests include arithmetic algorithm optimization， computational intelligence.
Supported by:
National Natural Science Foundation of China(61866003)

摘要/Abstract

摘要：

为了解决DeepLabV3+在语义分割时未充分利用主干的低级特征，以及大倍数上采样造成有效特征缺失的问题，提出一种累积分布通道注意力DeepLabV3+（CDCA-DLV3+）模型。首先，基于累积分布函数和通道注意力提出了累积分布通道注意力（CDCA）；然后，利用CDCA获取主干部分有效的低级特征；最后，采用特征金字塔网络（FPN）进行特征融合和逐步上采样，从而避免大倍数上采样所造成的特征损失。CDCA-DLV3+模型在Pascal VOC 2012验证集与Cityscapes数据集上的平均交并比（mIoU）分别为80.09%和80.11%，相较于DeepLabV3+模型分别提升1.24和1.02个百分点。实验结果表明，所提模型分割结果更加精确。

关键词: 深度学习, 图像语义分割, 通道注意力机制, DeepLabV3+, 累积分布函数

Abstract:

In order to solve the problems that the low-level features of the backbone are not fully utilized， and the effective features are lost due to large-times upsampling in DeepLabV3+ semantic segmentation， a Cumulative Distribution Channel Attention DeepLabV3+ （CDCA-DLV3+） model was proposed. Firstly， a Cumulative Distribution Channel Attention （CDCA） was proposed based on the cumulative distribution function and channel attention. Then， the cumulative distribution channel attention was used to obtain the effective low-level features of the backbone part. Finally， the Feature Pyramid Network （FPN） was adopted for feature fusion and gradual upsampling to avoid the feature loss caused by large-times upsampling. On validation set Pascal Visual Object Classes （VOC）2012 and dataset Cityscapes， the mean Intersection over Union （mIoU） of CDCA-DLV3+ model was 80.09% and 80.11% respectively， which was 1.24 percentage points and 1.02 percentage points higher than that of DeepLabV3+ model. Experimental results show that the proposed model has more accurate segmentation results.

Key words: deep learning, image semantic segmentation, channel attention mechanism, DeepLabV3+, cumulative distribution function

中图分类号:

TP183

何雪东, 宣士斌, 王款, 陈梦楠. 融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法[J]. 计算机应用, 2023, 43(3): 936-942.

Xuedong HE, Shibin XUAN, Kuan WANG, Mengnan CHEN. DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism[J]. Journal of Computer Applications, 2023, 43(3): 936-942.

图/表 10

图1 DeepLabV3+网络结构

Fig.1 DeepLabV3+ network structure

图2 CDCA-DLV3+网络结构

Fig.2 CDCA-DLV3+ network structure

图3 CDCA-DLV3+网络结构细节

Fig.3 CDCA-DLV3+ network structure details

图4 累积分布通道注意力模块结构

Fig.4 Structure of CDCA module

图5 特征金字塔网络结构

Fig.5 Feature pyramid network structure

表1 机器软硬件配置

Tab. 1 Machine software and hardware configuration

软硬件配置	配置详情
CPU	Intel Xeon Silver 4114
内存	256 GB
显卡	RTX 8000
操作系统	Ubuntu 20.04
CUDA	Cuda 11.4
Python	Python 3.6
Pytorch	Pytorch 1.8.1

表2 CDCA模块以及ASPP空洞率对网络的影响

Tab. 2 Influence of CDCA module and ASPP atrous rate on network

1/8	1/16	ASPP空洞率	mIoU/%	浮点运算量/GFLOPs	参数量/10⁶
—	—	（6，12，18）	78.85	92.82	59.23
􀳫	—		79.30	93.66	59.43
—	􀳫		79.02	108.44	60.94
􀳫	􀳫		79.56	121.71	62.18
—	—	（4，8，12，16）	79.14	98.03	64.02
􀳫	—		80.09	98.87	64.22
—	􀳫		79.29	113.65	65.72
􀳫	􀳫		79.80	126.92	66.96

表3 不同模型的对比结果

Tab. 3 Comparison results of different models

模型	mIoU/%	浮点运算量/GFLOPs	参数量/10⁶	训练时间/h
DeepLabV2	76.35	75.40	61.41	—
DeepLabV3	77.21	71.16	58.04	—
DeepLabV3+	78.85	92.93	59.23	9.8
改进DeepLabV3+	79.97	99.53	64.65	—
模型1	79.30	93.66	59.43	10.0
模型2	80.09	98.87	64.22	11.2

图6 PACAL数据集上模型2与DeepLabV3+网络预测图对比

Fig.6 Prediction graph comparison between Model 2 and DeepLabV3+ network on PASCAL dataset

表4 在Cityscapes数据集上的mIoU对比 (%)

Tab. 4 mIoU comparison on Cityscapes dataset

模型	mIoU
DeepLab V3+	79.09
模型1	79.68
模型2	80.11

参考文献 33

1	YAN H T， ZHANG C， WU M. Lawin Transformer： improving semantic segmentation transformer with multi-scale representations via large window attention［EB/OL］. （2022-01-05）［2022-02-11］.. 10.48550/arXiv.2201.01615
2	田萱，王亮，丁琪. 基于深度学习的图像语义分割方法综述［J］. 软件学报， 2019， 30（2）：440-468. 10.13328/j.cnki.jos.005659
	TIAN X， WANG L， DING Q. Review of image semantic segmentation based on deep learning［J］. Journal of Software， 2019， 30（2）： 440-468. 10.13328/j.cnki.jos.005659
3	王龙飞，严春满. 道路场景语义分割综述［J］. 激光与光电子学进展， 2021， 58（12）： No.1200002. 10.3788/lop202158.1200002
	WANG L F， YAN C M. Review on semantic segmentation of road scenes［J］. Laser and Optoelectronics Progress， 2021， 58（12）： No.1200002. 10.3788/lop202158.1200002
4	PANELLA F， LIPANI A， BOEHM J. Semantic segmentation of cracks： data challenges and architecture［J］. Automation in Construction， 2022， 135： No.104110. 10.1016/j.autcon.2021.104110
5	MINAEE S， BOYKOV Y， PORIKLI F， et al. Image segmentation using deep learning： a survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（7）： 3523-3542.
6	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
7	GUO M H， LIU Z N， MU T J， et al. Beyond self-attention： external attention using two linear layers for visual tasks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022（Early Access）： 1-13. 10.1109/tpami.2022.3211006
8	GUO M H， XU T X， LIU J J， et al. Attention mechanisms in computer vision： a survey［J］. Computational Visual Media， 2022， 8（3）： 331-368. 10.1007/s41095-022-0271-y
9	FAN M Y， LAI S Q， HUANG J S， et al. Rethinking BiSeNet for real-time semantic segmentation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9711-9720. 10.1109/cvpr46437.2021.00959
10	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
11	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［EB/OL］. （2016-06-07）［2022-02-10］.. 10.1109/tpami.2017.2699184
12	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848. 10.1109/tpami.2017.2699184
13	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［EB/OL］. （2017-12-05）［2022-02-11］.. 10.1007/978-3-030-01234-2_49
14	CHEN L C， ZHU Y K， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 833-851. 10.1007/978-3-030-01234-2_49
15	WANG Q L， WU B G， ZHU P F， et al. ECA-Net： efficient channel attention for deep convolutional neural networks［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539. 10.1109/cvpr42600.2020.01155
16	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
17	杨贞，彭小宝，朱强强，等. 基于Deeplab V3 plus的自适应注意力机制图像分割算法［J］. 计算机应用， 2022， 42（1）：230-238.
	YANG Z， PENG X B， ZHU Q Q， et al. Image segmentation algorithm with adaptive attention mechanism based on Deeplab V3 Plus［J］. Journal of Computer Applications， 2022， 42（1）： 230-238.
18	YU F， KOLTUN V， FUNKHOUSER T. Dilated residual networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 636-644. 10.1109/cvpr.2017.75
19	张蕊，李锦涛. 基于深度学习的场景分割算法研究综述［J］. 计算机研究与发展， 2020， 57（4）：859-875. 10.7544/issn1000-1239.2020.20190513
	ZHANG R， LI J T. A survey on algorithm research of scene parsing based on deep learning［J］. Journal of Computer Research and Development， 2020， 57（4）： 859-875. 10.7544/issn1000-1239.2020.20190513
20	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
21	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. （2017-04-17）［2022-02-13］.. 10.48550/arXiv.1704.04861
22	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
23	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324. 10.1109/iccv.2019.00140
24	CHOLLET F. Xception： deep learning with depthwise separable convolutions［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1800-1807. 10.1109/cvpr.2017.195
25	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-12-20］..
26	程晓悦，赵龙章，胡穹，等. 基于膨胀卷积平滑及轻型上采样的实时语义分割［J］. 激光与光电子学进展， 2020， 57（2）： No.021017. 10.3788/lop57.021017
	CHENG X Y， ZHAO L Z， HU Q， et al. Real-time semantic segmentation based on dilated convolution smoothing and lightweight up-sampling［J］. Laser and Optoelectronics Progress， 2020， 57（2）： No.021017. 10.3788/lop57.021017
27	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
28	徐聪，王丽. 基于改进DeepLabv3+网络的图像语义分割方法［J］. 激光与光电子学进展， 2021， 58（16）： No.1610008. 10.3788/lop202158.1610008
	XU C， WANG L. Image semantic segmentation method based on improved DeepLabv3+ network［J］. Laser and Optoelectronics Progress， 2021， 58（16）： No.1610008. 10.3788/lop202158.1610008
29	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
30	EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The PASCAL visual object classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）： 98-136. 10.1007/s11263-014-0733-5
31	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223. 10.1109/cvpr.2016.350
32	OpenMMLab. MMSegmentation［CP/OL］. ［2021-10-10］..
33	XIE E Z， WANG W H， YU Z D， et al. SegFormer： simple and efficient design for semantic segmentation with Transformers［EB/OL］. （2021-10-28）［2022-02-12］..

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[5]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[6]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[7]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[8]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[9]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[10]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[11]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.
[12]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
[15]	赵雅娟, 孟繁军, 徐行健. 在线教育学习者知识追踪综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1683-1698.

融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法

DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism

RichHTML

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 33

相关文章 15

编辑推荐

Metrics