Improved DeepLabV3+ method based on adaptive attention and nested receptive field

doi:10.11772/j.issn.1001-9081.2025050595

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1408-1415.DOI: 10.11772/j.issn.1001-9081.2025050595

• Artificial intelligence • Previous Articles

Improved DeepLabV3+ method based on adaptive attention and nested receptive field

Changzheng XING, Xin ZHENG(), Di JIA, Junfeng LIANG

School of Electronics and Information Engineering，Liaoning Technical University，Huludao Liaoning 125105，China

Received:2025-06-03 Revised:2025-08-29 Accepted:2025-09-09 Online:2025-09-15 Published:2026-05-10
Contact: Xin ZHENG
About author:XING Changzheng， born in 1967， Ph. D.，professor. His research interests include artificial intelligence， information processing.
JIA Di， born in 1982， Ph. D.， professor. His research interests include stereo matching and 3D reconstruction， photogrammetry， visual spatial positioning， visual robotic arm operations.
First author contact:LIANG Jungfeng， born in 2000， M. S. candidate. His research interests include data mining，reinforcement learning.
Supported by:
National Key Research and Development Program of China(2018YFB402900);Key Project of Educational Department of Liaoning Province(LJ212410147003)

基于自适应注意力与嵌套感受野改进DeepLabV3+方法

邢长征, 郑鑫(), 贾迪, 梁浚锋

辽宁工程技术大学电子与信息工程学院，辽宁葫芦岛 125105

通讯作者: 郑鑫
作者简介:邢长征（1967—），男，辽宁阜新人，教授，博士，CCF会员，主要研究方向：人工智能、信息处理
贾迪（1982—），男，河北邢台人，教授，博士，主要研究方向：立体匹配与三维重建、摄影测量、视觉空间定位、视觉机械臂作业
梁浚锋（2000—），男，河南焦作人，硕士研究生，主要研究方向：数据挖掘、强化学习。
基金资助:
国家重点研发计划项目(2018YFB402900);辽宁省教育厅重点项目(LJ212410147003)

Abstract

Abstract:

To address the problems of high complexity and low segmentation accuracy for certain classes in DeepLabV3+ caused by atrous convolutions with different dilation rates， an improved method that integrates Evolutionary Nested Receptive Field （ENRF） module with Adaptive Class-Channel Attention （ACCA） mechanism was proposed. In this method， the original Atrous Spatial Pyramid Pooling （ASPP） module was replaced by ENRF module， and ACCA mechanism was incorporated into the fused features， enabling continuous expansion of receptive field and more fine-grained feature representation， and reducing the number of parameters and computational overhead to enhance the model’s efficiency and lightweightness. Firstly， ACCA mechanism was constructed by combining channel-adaptive and class-adaptive attention mechanisms， which exploited inter-channel and inter-class feature dependencies to strengthen the representation of critical information in feature maps. Secondly， ENRF module was designed by introducing convolution kernels of different sizes and dilation rates， forming a nested evolutionary receptive field structure that gradually enlarged the receptive field to capture multi-scale contextual information and fine-grained boundary details. The improved method was compared with Fully Convolutional Network with 8s skip connections （FCN8s）， Pyramid Scene Parsing Network （PSPNet）， Unified Perceptual parsing Network （UPerNet）， Bilateral Segmentation Network Version 2 （BiSeNet V2）， Deep Feature Aggregation Network （DFANet）， and the original DeepLabV3+ in terms of FLOPs （FLoating-point OPerations）， parameter count， mean Intersection over Union （mIoU）， inference speed， and memory usage. Experimental results show that the improved DeepLabV3+ reduces parameters and FLOPs， accelerates inference， and improves segmentation performance.

Key words: nested evolution, lightweighting, feature dependency, dilation rate, DeepLabV3+

摘要：

针对DeepLabV3+模型因使用不同膨胀率空洞卷积导致复杂度高及部分类别分割精度低的问题，提出一种融合进化式嵌套感受野（ENRF）模块与自适应类别通道注意力（ACCA）机制的改进方法。该方法将原有空洞空间卷积池化金字塔（ASPP）模块替换为ENRF模块，并在融合特征中引入ACCA机制，实现了感受野的连续拓展与更精细化的特征表达，同时降低了参数量和计算开销，提升了模型的轻量化水平。首先，ACCA机制通过融合通道自适应注意力与类别自适应2种注意力机制，挖掘通道间和类别间的特征依赖关系，提升特征图中关键信息的表达能力；其次，ENRF模块引入不同大小和不同膨胀率的卷积核，构建了一种基于嵌套感受野演化的网络结构，以扩大特征图的感受野，捕捉多尺度的上下文信息及细粒度的边缘特征。与全卷积网络（FCN8s）、金字塔场景解析网络（PSPNet）、统一感知解析网络（UPerNet）、双向分割网络（BiSeNet V2）、深度特征聚合网络（DFANet）以及原始DeepLabV3+在浮点运算次数（FLOPs）、参数量、均值交并比（mIoU）、推理速度和内存占用5个指标上进行对比实验的结果表明，改进后的DeepLabV3+方法在减少参数量和FLOPs的同时，也提高了推理速度并改善了图像分割性能。

关键词: 嵌套演化, 轻量化, 特征依赖, 膨胀率, DeepLabV3+

CLC Number:

TP751

Changzheng XING, Xin ZHENG, Di JIA, Junfeng LIANG. Improved DeepLabV3+ method based on adaptive attention and nested receptive field[J]. Journal of Computer Applications, 2026, 46(5): 1408-1415.

邢长征, 郑鑫, 贾迪, 梁浚锋. 基于自适应注意力与嵌套感受野改进DeepLabV3+方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1408-1415.

Figures/Tables 10

References 29

[1]	王碧瑶，韩毅，崔航滨，等.基于图像的道路语义分割检测方法［J］.山东大学学报（工学版），2023，53（5）：37-47.
	WANG B Y， HAN Y， CUI H B， et al. Road semantic segmentation detection method based on image［J］. Journal of Shandong University （Engineering Science）， 2023， 53（5）： 37-47.
[2]	刘云翔，管钎汛，石艳娇.基于语义分割的复杂驾驶场景障碍物检测［J］.计算机仿真，2023，40（12）：167-171.
	LIU Y X， GUAN Q X， SHI Y J. Obstacle detection in complex driving scenarios based on semantic segmentation［J］. Computer Simulation， 2023， 40（12）： 167-171.
[3]	宋建丽，吕晓琪，谷宇.语义流引导采样结合注意力机制的脑肿瘤图像分割［J］.光学精密工程，2024，32（4）：565-577.
	SONG J L， LYU X Q， GU Y. Brain tumor image segmentation based on semantic flow guided sampling and attention mechanism［J］. Optics and Precision Engineering， 2024， 32（4）： 565-577.
[4]	汪华登，王雪馨，黎兵兵，等.GZMH：用于有丝分裂细胞核检测和分割的乳腺癌病理图像数据集［J］.中国图象图形学报，2024，29（3）：608-619.
	WANG H D， WANG X X， LI B B， et al. GZMH： a dataset of breast cancer pathological images for mitosis nuclei detection and segmentation［J］. Journal of Image and Graphics， 2024， 29（3）： 608-619.
[5]	王雅丽.基于改进Swin-Unet腹部多器官图像分割方法研究［J］.现代计算机，2023，29（3）：81-84.
	WANG Y L. Research on abdominal multi organ image segmentation based on improved Swin-Unet［J］. Modern Computer， 2023， 29（3）： 81-84.
[6]	彭明，丁汉泽，刘艳芳，等.解耦融合机制的金属表面缺陷小样本分割网络［J］.闽南师范大学学报（自然科学版），2024，37（3）： 57-70.
	PENG M， DING H Z， LIU Y F， et al. Decoupling fusion mechanism-based network for metal surface defect few-shot segmentation［J］. Journal of Minnan Normal University （Natural Science）， 2024， 37（3）： 57-70.
[7]	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440.
[8]	YEGNANARAYANA B. Artificial neural networks［M］. Delhi： PHI Learning Pvt. Ltd.， 2004： 1-2.
[9]	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
[10]	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239.
[11]	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
[12]	KRÄHENBÜHL P， KOLTUN V. Efficient inference in fully connected CRFs with Gaussian edge potentials［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2011： 109-117.
[13]	CHEN L C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 833-851.
[14]	LIU R， TAO F， LIU X， et al. RAANet： a residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images［J］. Remote Sensing， 2022， 14（13）： No.3109.
[15]	SUN X， ZHANG Y， CHEN C， et al. High-order paired-ASPP for deep semantic segmentation networks［J］. Information Sciences， 2023， 646： No.119364.
[16]	XI Y， LI S， XU Z， et al. LapUNet： a novel approach to monocular depth estimation using dynamic Laplacian residual U‑shape networks［J］. Scientific Reports， 2024， 14： No.23544.
[17]	DING P， QIAN H， ZHOU Y， et al. Real-time efficient semantic segmentation network based on improved ASPP and parallel fusion module in complex scenes［J］. Journal of Real-Time Image Processing， 2023， 20（3）： No.41.
[18]	LI Y， YUAN G， WEN Y， et al. EfficientFormer： Vision Transformers at MobileNet speed［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 12934-12949.
[19]	LIDA T， KOMATSU T， KANEDA K， et al. Visual explanation generation based on lambda attention branch networks［C］// Proceedings of the 2022 Asian Conference on Computer Vision， LNCS 13842. Cham： Springer， 2023： 475-490.
[20]	SHAKER A， MAAZ M， RASHEED H， et al. SwiftFormer： efficient additive attention for Transformer-based real-time mobile vision applications［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 17379-17390.
[21]	FENG X， DU H， FAN H， et al. SEFormer： structure embedding Transformer for 3D object detection［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 632-640.
[22]	HENDRYCKS D， GIMPEL K. Gaussian Error Linear Units （GELUs）［EB/OL］. ［2025-04-11］..
[23]	EVERINGHAM M， ESLAMI S M A， VAN GOOL L， et al. The PASCAL visual object classes challenge： a retrospective［J］. International Journal of Computer Vision， 2015， 111（1）： 98-136.
[24]	CORDTS M， OMRAN M， RAMOS S， et al. The Cityscapes dataset for semantic urban scene understanding［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 3213-3223.
[25]	XIAO T， LIU Y， ZHOU B， et al. Unified perceptual parsing for scene understanding［C］// Proceedings of the 2018 European Conference on Computer Vision. Berlin： Springer， 2018： 418-434.
[26]	YU C， GAO C， WANG J， et al. BiSeNet V2： bilateral network with guided aggregation for real-time semantic segmentation［J］. International Journal of Computer Vision， 2021， 129： 3051-3068.
[27]	LI H， XIONG P， FAN H， et al. Dfanet： Deep feature aggregation for real-time semantic segmentation［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 9522-9531.
[28]	CHOLLET F. Xception： deep learning with depthwise separable convolutions［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1800-1807.
[29]	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.

算法	骨干	VOC 2012			Cityscapes
算法	骨干	GFLOPs	Params/10⁶	mIoU	GFLOPs	Params/10⁶	mIoU
FCN8s	VGG16	1 843.42	134.36	0.428 8	1 953.42	134.36	0.431 0
PSPNet	ResNet101	2 102.23	65.00	0.573 0	2 299.80	65.00	0.579 4
UPerNet	Resnet50	1 464.55	126.58	0.546 0	1 583.99	126.58	0.551 1
BiSeNet V2	—	58.68	49.00	0.586 7	55.36	49.00	0.592 5
DFANet	—	3.67	7.80	0.571 9	3.48	7.80	0.580 1
DeepLabV3+	MobileNetV2	15.89	6.00	0.596 6	5.66	6.00	0.602 8
DeepLabV3+ENRF+ACCA	MobileNetV2	13.08	5.22	0.611 2	4.04	5.22	0.628 3

算法	骨干	VOC 2012			Cityscapes
算法	骨干	GFLOPs	Params/10⁶	mIoU	GFLOPs	Params/10⁶	mIoU
FCN8s	VGG16	1 843.42	134.36	0.428 8	1 953.42	134.36	0.431 0
PSPNet	ResNet101	2 102.23	65.00	0.573 0	2 299.80	65.00	0.579 4
UPerNet	Resnet50	1 464.55	126.58	0.546 0	1 583.99	126.58	0.551 1
BiSeNet V2	—	58.68	49.00	0.586 7	55.36	49.00	0.592 5
DFANet	—	3.67	7.80	0.571 9	3.48	7.80	0.580 1
DeepLabV3+	MobileNetV2	15.89	6.00	0.596 6	5.66	6.00	0.602 8
DeepLabV3+ENRF+ACCA	MobileNetV2	13.08	5.22	0.611 2	4.04	5.22	0.628 3

算法	骨干	GTX 1050		GTX 1660 Ti
算法	骨干	推理时间/ms	内存占用/MB	推理时间/ms	内存占用/MB
FCN8s	VGG16	1.834 0	2 678.086 0	0.946 4	3 488.108 6
PSPNet	ResNet101	10.045 0	4 557.440 3	1.002 0	4 557.440 8
UPerNet	Resnet50	1.956 6	2 131.594 8	0.658 6	3 235.591 5
DeepLabV3+	MobileNetV2	0.394 4	811.024 5	0.091 4	1 953.716 8
DeepLabV3+ENRF+ACCA	MobileNetV2	0.287 0	809.990 3	0.088 0	1 950.961 3

算法	骨干	GTX 1050		GTX 1660 Ti
算法	骨干	推理时间/ms	内存占用/MB	推理时间/ms	内存占用/MB
FCN8s	VGG16	1.834 0	2 678.086 0	0.946 4	3 488.108 6
PSPNet	ResNet101	10.045 0	4 557.440 3	1.002 0	4 557.440 8
UPerNet	Resnet50	1.956 6	2 131.594 8	0.658 6	3 235.591 5
DeepLabV3+	MobileNetV2	0.394 4	811.024 5	0.091 4	1 953.716 8
DeepLabV3+ENRF+ACCA	MobileNetV2	0.287 0	809.990 3	0.088 0	1 950.961 3

算法	骨干	VOC 2012			Cityscapes
算法	骨干	GFLOPs	Params/10⁶	mIoU	GFLOPs	Params/10⁶	mIoU
DeepLabV3+ ACCA	MobileNetV2	16.78	6.38	0.602 3	5.98	6.38	0.617 2
DeepLabV3+ ENRF	MobileNetV2	12.64	5.84	0.582 2	3.64	5.84	0.596 3
DeepLabV3+	Xception	40.56	37.05	0.631 7	10.70	37.05	0.623 2
DeepLabV3+ENRF+ACCA	Xception	34.89	28.48	0.649 8	9.40	28.48	0.631 1
DeepLabV3+	ResNet101	66.74	58.75	0.692 0	15.34	58.75	0.673 1
DeepLabV3+ENRF+ACCA	ResNet101	58.85	49.19	0.708 4	14.28	49.19	0.695 1

Improved DeepLabV3+ method based on adaptive attention and nested receptive field

基于自适应注意力与嵌套感受野改进DeepLabV3+方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 29

Related Articles 7

Recommended Articles

Metrics

[1]	Ping HUANG, Qing LI, Haifeng QIU, Chengsi WANG, Anzi HUANG, Xiang ZHANG. Real-time face blurring method based on head skeleton point detection [J]. Journal of Computer Applications, 2026, 46(2): 596-603.
[2]	Ning CAO, Xin WEN, Yanrong HAO, Rui CAO. Lightweight motor imagery electroencephalogram decoding neural network with multi-domain feature fusion [J]. Journal of Computer Applications, 2026, 46(1): 289-296.
[3]	Yun ZHANG, Shuying WANG, Qing ZHENG, Haizhu ZHANG. Lightweight algorithm of 3D mesh model for preserving detailed geometric features [J]. Journal of Computer Applications, 2023, 43(4): 1226-1232.
[4]	Xuedong HE, Shibin XUAN, Kuan WANG, Mengnan CHEN. DeepLabV3+ image segmentation algorithm fusing cumulative distribution function and channel attention mechanism [J]. Journal of Computer Applications, 2023, 43(3): 936-942.
[5]	Xiaofei JI, Kexin ZHANG, Lirong TANG. Book spine segmentation algorithm based on improved DeepLabv3+ network [J]. Journal of Computer Applications, 2023, 43(12): 3927-3932.
[6]	LIN Leping, LI Sanfeng, OUYANG Ning. Multi-pose feature fusion generative adversarial network based face reconstruction method [J]. Journal of Computer Applications, 2020, 40(10): 2856-2862.
[7]	DaiZhong LUO WenYun ZHAO. Approach of feature dependency modeling for software product line [J]. Journal of Computer Applications, 2008, 28(9): 2349-2352.