DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism

doi:10.11772/j.issn.1001-9081.2022020210

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 936-942.DOI: 10.11772/j.issn.1001-9081.2022020210

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism

Xuedong HE¹, Shibin XUAN¹^,²(), Kuan WANG¹, Mengnan CHEN¹

^1.School of Artificial Intelligence，Guangxi Minzu University，Nanning Guangxi 530006，China
^2.Guangxi Key Laboratory of Hybrid Computation and IC Design and Analysis （Guangxi Minzu University），Nanning Guangxi 530006，China

Received:2022-02-24 Revised:2022-05-25 Accepted:2022-05-25 Online:2022-08-16 Published:2023-03-10
Contact: Shibin XUAN
About author:HE Xuedong， born in 1997， M. S. candidate. His research interests include semantic segmentation， computer vision.
WANG Kuan， born in 1995， M. S. candidate. His research interests include pose estimation， deep learning.
CHEN Mengnan， born in 1997， M. S. candidate. His research interests include arithmetic algorithm optimization， computational intelligence.
Supported by:
National Natural Science Foundation of China(61866003)

融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法

何雪东¹, 宣士斌¹^,²(), 王款¹, 陈梦楠¹

^1.广西民族大学人工智能学院，南宁 530006
^2.广西混杂计算与集成电路设计分析重点实验室（广西民族大学），南宁 530006

通讯作者: 宣士斌
作者简介:何雪东（1997—），男，吉林松原人，硕士研究生，CCF会员，主要研究方向：语义分割、计算机视觉
宣士斌（1964—），男，安徽无为人，教授，博士，主要研究方向：图像处理与识别
王款（1995—），男，江苏海安人，硕士研究生，主要研究方向：姿态估计、深度学习
陈梦楠（1997—），男，山西长治人，硕士研究生，主要研究方向：算法优化、计算智能。
基金资助:
国家自然科学基金资助项目(61866003)

Abstract

Abstract:

In order to solve the problems that the low-level features of the backbone are not fully utilized， and the effective features are lost due to large-times upsampling in DeepLabV3+ semantic segmentation， a Cumulative Distribution Channel Attention DeepLabV3+ （CDCA-DLV3+） model was proposed. Firstly， a Cumulative Distribution Channel Attention （CDCA） was proposed based on the cumulative distribution function and channel attention. Then， the cumulative distribution channel attention was used to obtain the effective low-level features of the backbone part. Finally， the Feature Pyramid Network （FPN） was adopted for feature fusion and gradual upsampling to avoid the feature loss caused by large-times upsampling. On validation set Pascal Visual Object Classes （VOC）2012 and dataset Cityscapes， the mean Intersection over Union （mIoU） of CDCA-DLV3+ model was 80.09% and 80.11% respectively， which was 1.24 percentage points and 1.02 percentage points higher than that of DeepLabV3+ model. Experimental results show that the proposed model has more accurate segmentation results.

Key words: deep learning, image semantic segmentation, channel attention mechanism, DeepLabV3+, cumulative distribution function

摘要：

为了解决DeepLabV3+在语义分割时未充分利用主干的低级特征，以及大倍数上采样造成有效特征缺失的问题，提出一种累积分布通道注意力DeepLabV3+（CDCA-DLV3+）模型。首先，基于累积分布函数和通道注意力提出了累积分布通道注意力（CDCA）；然后，利用CDCA获取主干部分有效的低级特征；最后，采用特征金字塔网络（FPN）进行特征融合和逐步上采样，从而避免大倍数上采样所造成的特征损失。CDCA-DLV3+模型在Pascal VOC 2012验证集与Cityscapes数据集上的平均交并比（mIoU）分别为80.09%和80.11%，相较于DeepLabV3+模型分别提升1.24和1.02个百分点。实验结果表明，所提模型分割结果更加精确。

关键词: 深度学习, 图像语义分割, 通道注意力机制, DeepLabV3+, 累积分布函数

CLC Number:

TP183

Xuedong HE, Shibin XUAN, Kuan WANG, Mengnan CHEN. DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism[J]. Journal of Computer Applications, 2023, 43(3): 936-942.

何雪东, 宣士斌, 王款, 陈梦楠. 融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 936-942.

Figures/Tables 10

References 33

1	YAN H T， ZHANG C， WU M. Lawin Transformer： improving semantic segmentation transformer with multi-scale representations via large window attention［EB/OL］. （2022-01-05）［2022-02-11］.. 10.48550/arXiv.2201.01615
2	田萱，王亮，丁琪. 基于深度学习的图像语义分割方法综述［J］. 软件学报， 2019， 30（2）：440-468. 10.13328/j.cnki.jos.005659
	TIAN X， WANG L， DING Q. Review of image semantic segmentation based on deep learning［J］. Journal of Software， 2019， 30（2）： 440-468. 10.13328/j.cnki.jos.005659
3	王龙飞，严春满. 道路场景语义分割综述［J］. 激光与光电子学进展， 2021， 58（12）： No.1200002. 10.3788/lop202158.1200002
	WANG L F， YAN C M. Review on semantic segmentation of road scenes［J］. Laser and Optoelectronics Progress， 2021， 58（12）： No.1200002. 10.3788/lop202158.1200002
4	PANELLA F， LIPANI A， BOEHM J. Semantic segmentation of cracks： data challenges and architecture［J］. Automation in Construction， 2022， 135： No.104110. 10.1016/j.autcon.2021.104110
5	MINAEE S， BOYKOV Y， PORIKLI F， et al. Image segmentation using deep learning： a survey［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（7）： 3523-3542.
6	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
7	GUO M H， LIU Z N， MU T J， et al. Beyond self-attention： external attention using two linear layers for visual tasks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022（Early Access）： 1-13. 10.1109/tpami.2022.3211006
8	GUO M H， XU T X， LIU J J， et al. Attention mechanisms in computer vision： a survey［J］. Computational Visual Media， 2022， 8（3）： 331-368. 10.1007/s41095-022-0271-y
9	FAN M Y， LAI S Q， HUANG J S， et al. Rethinking BiSeNet for real-time semantic segmentation［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9711-9720. 10.1109/cvpr46437.2021.00959
10	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
11	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［EB/OL］. （2016-06-07）［2022-02-10］.. 10.1109/tpami.2017.2699184
12	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848. 10.1109/tpami.2017.2699184
13	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［EB/OL］. （2017-12-05）［2022-02-11］.. 10.1007/978-3-030-01234-2_49
14	CHEN L C， ZHU Y K， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 833-851. 10.1007/978-3-030-01234-2_49
15	WANG Q L， WU B G， ZHU P F， et al. ECA-Net： efficient channel attention for deep convolutional neural networks［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539. 10.1109/cvpr42600.2020.01155
16	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
17	杨贞，彭小宝，朱强强，等. 基于Deeplab V3 plus的自适应注意力机制图像分割算法［J］. 计算机应用， 2022， 42（1）：230-238.
	YANG Z， PENG X B， ZHU Q Q， et al. Image segmentation algorithm with adaptive attention mechanism based on Deeplab V3 Plus［J］. Journal of Computer Applications， 2022， 42（1）： 230-238.
18	YU F， KOLTUN V， FUNKHOUSER T. Dilated residual networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 636-644. 10.1109/cvpr.2017.75
19	张蕊，李锦涛. 基于深度学习的场景分割算法研究综述［J］. 计算机研究与发展， 2020， 57（4）：859-875. 10.7544/issn1000-1239.2020.20190513
	ZHANG R， LI J T. A survey on algorithm research of scene parsing based on deep learning［J］. Journal of Computer Research and Development， 2020， 57（4）： 859-875. 10.7544/issn1000-1239.2020.20190513
20	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
21	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications［EB/OL］. （2017-04-17）［2022-02-13］.. 10.48550/arXiv.1704.04861
22	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
23	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324. 10.1109/iccv.2019.00140
24	CHOLLET F. Xception： deep learning with depthwise separable convolutions［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1800-1807. 10.1109/cvpr.2017.195
25	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-12-20］..
26	程晓悦，赵龙章，胡穹，等. 基于膨胀卷积平滑及轻型上采样的实时语义分割［J］. 激光与光电子学进展， 2020， 57（2）： No.021017. 10.3788/lop57.021017
	CHENG X Y， ZHAO L Z， HU Q， et al. Real-time semantic segmentation based on dilated convolution smoothing and lightweight up-sampling［J］. Laser and Optoelectronics Progress， 2020， 57（2）： No.021017. 10.3788/lop57.021017
27	HE K M， ZHANG X Y， REN S Q， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
28	徐聪，王丽. 基于改进DeepLabv3+网络的图像语义分割方法［J］. 激光与光电子学进展， 2021， 58（16）： No.1610008. 10.3788/lop202158.1610008
	XU C， WANG L. Image semantic segmentation method based on improved DeepLabv3+ network［J］. Laser and Optoelectronics Progress， 2021， 58（16）： No.1610008. 10.3788/lop202158.1610008
29	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
30	EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The PASCAL visual object classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）： 98-136. 10.1007/s11263-014-0733-5
31	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223. 10.1109/cvpr.2016.350
32	OpenMMLab. MMSegmentation［CP/OL］. ［2021-10-10］..
33	XIE E Z， WANG W H， YU Z D， et al. SegFormer： simple and efficient design for semantic segmentation with Transformers［EB/OL］. （2021-10-28）［2022-02-12］..

软硬件配置	配置详情
CPU	Intel Xeon Silver 4114
内存	256 GB
显卡	RTX 8000
操作系统	Ubuntu 20.04
CUDA	Cuda 11.4
Python	Python 3.6
Pytorch	Pytorch 1.8.1

软硬件配置	配置详情
CPU	Intel Xeon Silver 4114
内存	256 GB
显卡	RTX 8000
操作系统	Ubuntu 20.04
CUDA	Cuda 11.4
Python	Python 3.6
Pytorch	Pytorch 1.8.1

1/8	1/16	ASPP空洞率	mIoU/%	浮点运算量/GFLOPs	参数量/10⁶
—	—	（6，12，18）	78.85	92.82	59.23
􀳫	—		79.30	93.66	59.43
—	􀳫		79.02	108.44	60.94
􀳫	􀳫		79.56	121.71	62.18
—	—	（4，8，12，16）	79.14	98.03	64.02
􀳫	—		80.09	98.87	64.22
—	􀳫		79.29	113.65	65.72
􀳫	􀳫		79.80	126.92	66.96

1/8	1/16	ASPP空洞率	mIoU/%	浮点运算量/GFLOPs	参数量/10⁶
—	—	（6，12，18）	78.85	92.82	59.23
􀳫	—		79.30	93.66	59.43
—	􀳫		79.02	108.44	60.94
􀳫	􀳫		79.56	121.71	62.18
—	—	（4，8，12，16）	79.14	98.03	64.02
􀳫	—		80.09	98.87	64.22
—	􀳫		79.29	113.65	65.72
􀳫	􀳫		79.80	126.92	66.96

模型	mIoU/%	浮点运算量/GFLOPs	参数量/10⁶	训练时间/h
DeepLabV2	76.35	75.40	61.41	—
DeepLabV3	77.21	71.16	58.04	—
DeepLabV3+	78.85	92.93	59.23	9.8
改进DeepLabV3+	79.97	99.53	64.65	—
模型1	79.30	93.66	59.43	10.0
模型2	80.09	98.87	64.22	11.2

DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism

融合累积分布函数和通道注意力机制的DeepLabV3+图像分割算法

RichHTML

PDF

PDF (Mobile)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 33

Related Articles 15

Recommended Articles

Metrics

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[3]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[4]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[5]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[6]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[7]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[8]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[9]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.
[10]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[11]	Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP： defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086.
[12]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.
[15]	Yajuan ZHAO, Fanjun MENG, Xingjian XU. Review of online education learner knowledge tracing [J]. Journal of Computer Applications, 2024, 44(6): 1683-1698.

模型	mIoU
DeepLab V3+	79.09
模型1	79.68
模型2	80.11

模型	mIoU
DeepLab V3+	79.09
模型1	79.68
模型2	80.11