引入解耦残差自注意力的边界交叉监督语义分割网络

doi:10.11772/j.issn.1001-9081.2024040415

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (4): 1120-1129.DOI: 10.11772/j.issn.1001-9081.2024040415

引入解耦残差自注意力的边界交叉监督语义分割网络

姜坤元¹, 李小霞¹^,²(), 王利³, 曹耀丹³, 张晓强¹^,², 丁楠¹, 周颖玥¹^,²

^1.西南科技大学信息工程学院，四川绵阳 621010
^2.四川省工业自主可控人工智能工程技术研究中心，四川绵阳 621010
^3.绵阳四〇四医院，四川绵阳 621000

收稿日期:2024-04-11 修回日期:2024-06-26 接受日期:2024-06-28 发布日期:2025-04-08 出版日期:2025-04-10
通讯作者: 李小霞
作者简介:姜坤元（2000—），女，山东淄博人，硕士研究生，CCF会员，主要研究方向：模式识别、医学图像处理
李小霞（1976—），女，四川安岳人，教授，博士，主要研究方向：计算机视觉、模式识别、信号与信息处理
王利（1989—），女，四川巴中人，主治医师，硕士，主要研究方向：消化道肿瘤的内镜诊治
曹耀丹（1991—），女，四川凉山人，主治医师，硕士，主要研究方向：消化道肿瘤
张晓强（1987—），男，山东泰安人，讲师，博士，主要研究方向：计算机视觉、计算成像学
丁楠（1999—），男，河南周口人，硕士研究生，主要研究方向：模式识别、医学图像处理
周颖玥（1983—），女（藏族），四川马尔康人，副研究员，博士，主要研究方向：图像处理与分析。
基金资助:
国家自然科学基金资助项目(62071399);四川省科技计划项目(2023YFG0262)

Boundary-cross supervised semantic segmentation network with decoupled residual self-attention

Kunyuan JIANG¹, Xiaoxia LI¹^,²(), Li WANG³, Yaodan CAO³, Xiaoqiang ZHANG¹^,², Nan DING¹, Yingyue ZHOU¹^,²

^1.School of Information Engineering，Southwest University of Science and Technology，Mianyang Sichuan 621010，China
^2.Sichuan Industrial Autonomous and Controllable Artificial Intelligence Engineering Technology Research Center，Mianyang Sichuan 621010，China
^3.Mianyang 404 Hospital，Mianyang Sichuan 621000，China

Received:2024-04-11 Revised:2024-06-26 Accepted:2024-06-28 Online:2025-04-08 Published:2025-04-10
Contact: Xiaoxia LI
About author:JIANG Kunyuan， born in 2000， M. S. candidate. Her research interests include pattern recognition， medical image processing.
LI Xiaoxia， born in 1976， Ph. D.， professor. Her research interests include computer vision， pattern recognition， signal and information processing.
WANG Li， born in 1989， M. S.， attending physician. Her research interests include endoscopic diagnosis and treatment of gastrointestinal tumor.
CAO Yaodan， born in 1991， M. S.， attending physician. Her research interests include gastrointestinal tumor.
ZHANG Xiaoqiang， born in 1987， Ph. D.， lecturer. His research interests include computer vision， computational imaging.
DING Nan， born in 1999， M. S. candidate. His research interests include pattern recognition， medical image processing.
ZHOU Yingyue， born in 1983， Ph. D.， associate research fellow. Her research interests include image processing and analysis.
Supported by:
National Natural Science Foundation of China(62071399);Sichuan Science and Technology Program(2023YFG0262)

摘要/Abstract

摘要：

针对内镜语义分割网络中病灶边缘信息丢失和大面积病灶分割不全的问题，提出一种引入解耦残差自注意力（DRA）的边界交叉监督语义分割网络（BCS-SegNet）。首先，引入DRA，以增强网络对远距离关联性病灶的学习能力；其次，构建跨级交叉融合（CLF）模块，从而将编码结构中的多级特征图逐对组合，进而实现在低计算成本下图像细节与语义信息的融合；最后，使用多方向多尺度的二维Gabor变换提取边缘信息，并使用空间注意力加权特征图中的边缘特征，以监督分割网络的解码过程，从而在像素级别上提供更精准的类内分割一致性。实验结果表明，在ISIC2018皮肤镜和Kvasir-SEG/CVC-ClinicDB结肠镜数据集上，BCS-SegNet的平均交并比（mIoU）和Dice系数分别为84.27%、90.68%和79.24%、87.91%；在自建食管内镜数据集上，BCS-SegNet的mIoU和Dice系数分别为82.73%和90.84%，mIoU相较于U-net和UCTransNet分别提升了3.30%和4.97%。可见，所提网络可以达到更完整的分割区域和更清晰的边缘细节等视觉效果。

关键词: 食管内镜图像, 医学图像分割, 自注意力机制, 二维Gabor变换, 多尺度边缘特征

Abstract:

Focused on the challenges of edge information loss and incomplete segmentation of large lesions in endoscopic semantic segmentation networks， a Boundary-Cross Supervised semantic Segmentation Network （BCS-SegNet） with Decoupled Residual Self-Attention （DRA） was proposed. Firstly， DRA was introduced to enhance the network’s ability to learn distantly related lesions. Secondly， a Cross Level Fusion （CLF） module was constructed to combine multi-level feature maps within the encoding structure in a pairwise way， so as to realize the fusion of image details and semantic information at low computational cost. Finally， multi-directional and multi-scale 2D Gabor transform was utilized to extract edge information， and spatial attention was used to weight edge features in the feature maps， so as to supervise decoding process of the segmentation network， thereby providing more accurate intra-class segmentation consistency at pixel level. Experimental results demonstrate that on ISIC2018 dermoscopy and Kvasir-SEG/CVC-ClinicDB colonoscopy datasets， BCS-SegNet achieves the mIoU （mean Intersection over Union） and Dice coefficient of 84.27%， 90.68% and 79.24%， 87.91%， respectively； on the self-built esophageal endoscopy dataset， BCS-SegNet achieves the mIoU of 82.73% and Dice coefficient of 90.84%， while the above mIoU is increased by 3.30% over that of U-net and 4.97% over that of UCTransNet. It can be seen that the proposed network can realize visual effects such as more complete segmentation regions and clearer edge details.

Key words: esophageal endoscopic image, medical image segmentation, self-attention mechanism, 2D Gabor transform, multi-scale edge feature

中图分类号:

TP391.4

姜坤元, 李小霞, 王利, 曹耀丹, 张晓强, 丁楠, 周颖玥. 引入解耦残差自注意力的边界交叉监督语义分割网络[J]. 计算机应用, 2025, 45(4): 1120-1129.

Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention[J]. Journal of Computer Applications, 2025, 45(4): 1120-1129.

图/表 17

图1 BCS-SegNet的整体结构

Fig. 1 Overall structure of BCS-SegNet

图2 解耦自注意力模块

Fig. 2 Decoupled self-attention module

图3 跨级交叉融合模块

Fig. 3 Cross level fusion module

图4 跨级交叉融合模块输出的可视化

Fig. 4 Output visualization of cross level fusion module

图5 5个尺度4个方向的Gabor滤波器实部

Fig. 5 Real parts of Gabor filters with five scales and four orientations

图6 内镜图像与20个Gabor滤波器卷积的实部响应

Fig. 6 Real part responses obtained by convolving endoscopic image with 20 Gabor filters

图7 边界监督解码模块

Fig. 7 Boundary supervised decoding module

图8 内镜图像与20个Gabor滤波器卷积的幅值响应

Fig. 8 Magnitude responses by convolving endoscopic image with 20 Gabor filters

表1 各实验数据集样本数

Tab. 1 Number of each experimental dataset

图像类型	训练集样本数	验证集样本数	测试集样本数	总数
皮肤镜图像	2 047	260	260	2 567
结肠镜图像	1 450	162	160	1 772
食管内镜图像	2 552	310	310	3 172

表2 不同网络在自建食管数据集上的结果对比

Tab. 2 Comparison of results of different networks on self-built esophageal endoscopy dataset

网络类型	网络	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶	帧率/（frame·s^-1）
CNN	U-net	80.09	87.19	226.15	24.89	20
	DeepLabV3+	79.80	88.53	264.60	70.07	12
	U²-Net	74.43	83.31	150.61	43.99	18
	EGE-UNet	63.40	75.46	0.28	0.04	26
Transformer+CNN	MedT	68.55	76.92	70.89	10.80	24
	TransUnet	81.21	87.86	129.29	93.23	14
	UCTransNet	78.81	86.90	172.01	66.24	12
	BCS-SegNet	82.73	90.84	232.71	24.98	19

表3 不同网络在公共数据集上的结果对比

Tab. 3 Comparison of results of different networks on public datasets

网络类型	方法	ISIC2018		Kvasir-SEG/CVC-ClinicDB		计算量/GFLOPs	参数量/10⁶
网络类型	方法	mIoU/%	Dice/%	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶
CNN	U-net	77.28	87.15	78.48	87.63	226.15	24.89
	UPerNet	77.37	89.39	75.44	84.18	30.73	27.39
	UNeXt	72.27	82.52	75.96	86.19	2.30	1.47
	EGE-UNet	80.12	88.96	68.97	81.64	0.28	0.04
Transformer+CNN	MedT	71.19	82.41	76.79	84.32	70.89	10.80
Transformer+CNN	BCS-SegNet	84.27	90.68	79.24	87.91	232.70	24.98

图9 自建食管数据集上的对比实验可视化结果比较

Fig. 9 Comparison experimental visualized results on self-built esophageal endoscopy dataset

图10 公共数据集上的对比实验可视化结果比较

Fig. 10 Comparison experimental visualized results on public datasets

表4 各模块消融实验结果

Tab. 4 Ablation experimental results for each module

实验编号	网络	mIoU/%	Dice/%
1	U-net	80.09	87.19
2	+DRA	81.33	88.24
3	+CLF	80.61	88.12
4	+BSD	82.70	90.72
5	+DRA、CLF	81.52	88.31
6	+DRA、CLF、BSD（BCS-SegNet）	82.73	90.84

图11 消融实验结果的可视化比较

Fig. 11 Visual comparison of ablation experimental results

表5 解耦残差自注意力的消融实验结果

Tab. 5 Ablation experimental results of decoupled residual self-attention

f₁	f₂	f₃	f₄	f₅	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶
√					81.47	88.36	235.48	24.91
	√				81.33	88.24	232.30	24.98
		√			80.19	87.35	231.76	25.23
			√		80.16	86.96	231.42	26.21
				√	79.44	85.87	227.52	26.21

表6 跨级交叉融合的消融实验结果

Tab. 6 Ablation experimental results of cross level fusion

编号	融合策略	mIoU/%	Dice/%
1	融合 $(f 1, f 4) (f 2, f 4) (f 3, f 4)$	80.61	88.12
2	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5)$	80.38	87.54
3	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5) (f 4, f 5)$	79.43	86.39

表6 跨级交叉融合的消融实验结果

Tab. 6 Ablation experimental results of cross level fusion

编号	融合策略	mIoU/%	Dice/%
1	融合 $(f 1, f 4) (f 2, f 4) (f 3, f 4)$	80.61	88.12
2	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5)$	80.38	87.54
3	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5) (f 4, f 5)$	79.43	86.39

参考文献 32

1	董育宁，刘天亮，戴修斌，等. 医学图像处理理论与应用［M］. 南京：东南大学出版社， 2020： 44-54.
	DONG Y N， LIU T L， DAI X B， et al. Medical image processing theory and applications［M］. Nanjing： Southeast University Press， 2020： 44-54.
2	窦猛，陈哲彬，王辛，等. 基于深度学习的多模态医学图像分割综述［J］. 计算机应用， 2023， 43（11）：3385-3395.
	DOU M， CHEN Z B， WANG X， et al. Review of multi-modal medical image segmentation based on deep learning［J］. Journal of Computer Applications， 2023， 43（11）：3385-3395.
3	GONZALEZ R C， WOODS R E. 数字图像处理（第四版）［M］. 阮秋琦，阮宇智，译. 北京：电子工业出版社， 2020： 504-572.
	GONZALEZ R C， WOODS R E. Digital image processing， fourth edition［M］. RUAN Q Q， RUAN Y Z， translated. Beijing： Publishing House of Electronics Industry， 2020： 504-572.
4	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
5	ASGARI TAGHANAKI S， ABHISHEK K， COHEN J P， et al. Deep semantic segmentation of natural and medical images： a review［J］. Artificial Intelligence Review， 2021， 54： 137-178.
6	CHEN L C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 833-851.
7	XIAO T， LIU Y， ZHOU B， et al. Unified perceptual parsing for scene understanding［C］// Proceedings of 2018 European Conference on Computer Vision， LNCS 11209. Cham： Springer， 2018： 432-448.
8	QIN X B， ZHANG Z C， HUANG C Y， et al. U²-Net： going deeper with nested U-structure for salient object detection［J］. Pattern Recognition， 2020， 106： No.107404.
9	VALANARASE J M J， PATEL V M. UNeXt： MLP-based rapid medical image segmentation network［C］// Proceedings of the 2022 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 13435. Cham： Springer， 2022： 23-33.
10	D’ASCOLI S， TOUVRON H， LEAVITT M L， et al. ConViT： improving vision Transformers with soft convolutional inductive biases［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 2286-2296.
11	AZAD R， KAZEROUNI A， HEIDARI M， et al. Advances in medical image analysis with vision Transformers： a comprehensive review［J］. Medical Image Analysis， 2024， 91： No.103000.
12	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation［EB/OL］. ［2023-11-21］..
13	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
14	WANG H， CAO P， WANG J， et al. UCTransnet： rethinking the skip connections in U-Net from a channel-wise perspective with Transformer［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 2441-2449.
15	ZHANG Y， LIU H Y， HU Q. TransFuse： fusing Transformers and CNNs for medical image segmentation［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 14-24.
16	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
17	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
18	VALANARASU J M J， OZA P， HACIHALILOGLU I， et al. Medical Transformer： gated axial-attention for medical image segmentation［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 36-46.
19	RUAN J， XIE M， GAO J， et al. EGE-UNet： an efficient group enhanced UNet for skin lesion segmentation［C］// Proceedings of the 2023 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 14223. Cham： Springer， 2023： 481-490.
20	SANDLER M， HOWARD A， ZHU M， et al. MobileNetV2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520.
21	KOKKINOS I. Pushing the boundaries of boundary detection using deep learning［EB/OL］. ［2023-11-21］..
22	NONG Z， SU X， LIU Y， et al. Boundary-aware dual-stream network for VHR remote sensing images semantic segmentation［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2021， 14： 5260-5268.
23	CONG R， YANG H， JIANG Q， et al. BCS-Net： boundary， context， and semantic for automatic COVID-19 lung infection segmentation from CT images［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： No.5019011.
24	CHEN F L， LIU H L， ZENG Z H， et al. BES-Net： boundary enhancing semantic context network for high-resolution image semantic segmentation［J］. Remote Sensing， 2022， 14（7）： No.1638.
25	LIN Y， ZHANG D， FANG X， et al. Rethinking boundary detection in deep learning models for medical image segmentation［C］// Proceedings of the 2023 International Conference on Information Processing in Medical Imaging， LNCS 13939. Cham： Springer， 2023： 730-742.
26	傅励瑶，尹梦晓，杨锋. 基于Transformer的U型医学图像分割网络综述［J］. 计算机应用， 2023， 43（5）：1584-1595.
	FU L Y， YIN M X， YANG F. Transformer based U-shaped medical image segmentation network： a survey［J］. Journal of Computer Applications， 2023， 43（5）：1584-1595.
27	朱希安，曹林. 小波分析及其在数字图像处理中的应用［M］. 北京：电子工业出版社， 2012： 163-169， 213-221.
	ZHU X A， CAO L. Wavelet analysis and its application in digital image processing［M］. Beijing： Publishing House of Electronics Industry， 2012： 163-169， 213-221.
28	MILLETARI F， NAVAB N， AHMADI S A. V-Net： fully convolutional neural networks for volumetric medical image segmentation［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 565-571.
29	CODELLA N， ROTEMBERG V， TSCHANDL P， et al. Skin lesion analysis toward melanoma detection 2018： a challenge hosted by the International Skin Imaging Collaboration （ISIC）［EB/OL］. ［2023-12-02］..
30	JHA D， SMEDSRUD P H， RIEGLER M A， et al. Kvasir-SEG： a segmented polyp dataset［C］// Proceedings of the 2020 International Conference on Multimedia Modeling， LNCS 11962. Cham： Springer， 2020： 451-462.
31	TAJBAKHSH N， GURUDU S R， LIANG J. Automated polyp detection in colonoscopy videos using shape and context information［J］. IEEE Transactions on Medical Imaging， 2016， 35（2）： 630-644.
32	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. ［2023-12-02］..

[1]	袁宝华, 陈佳璐, 王欢. 融合多尺度语义和双分支并行的医学图像分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 988-995.
[2]	宋鹏程, 郭立君, 张荣. 利用局部-全局时间依赖的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 240-246.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[5]	徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831.
[6]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[7]	许立君, 黎辉, 刘祖阳, 陈侃松, 马为駽. 基于3D‑Ghost卷积神经网络的脑胶质瘤MRI图像分割算法3D‑GA‑Unet[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1294-1302.
[8]	黄荣, 宋俊杰, 周树波, 刘浩. 基于自监督视觉Transformer的图像美学质量评价方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1269-1276.
[9]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[10]	罗歆然, 李天瑞, 贾真. 基于自注意力机制与词汇增强的中文医学命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 385-392.
[11]	顾聪, 段其强, 任思雨. 基于上下文感知网络的息肉分割算法[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3617-3622.
[12]	仇丽青, 苏小盼. 个性化多层兴趣提取点击率预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3411-3418.
[13]	杨兴耀, 沈洪涛, 张祖莲, 于炯, 陈嘉颖, 王东晓. 基于层级过滤器和时间卷积增强自注意力网络的序列推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3090-3096.
[14]	李言博, 何庆, 陆顺意. 融合语义和句法信息的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3275-3280.
[15]	陈佳, 张鸿. 基于特征增强和语义相关性匹配的图像文本检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 16-23.

引入解耦残差自注意力的边界交叉监督语义分割网络

Boundary-cross supervised semantic segmentation network with decoupled residual self-attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 32

相关文章 15

编辑推荐

Metrics