Boundary-cross supervised semantic segmentation network with decoupled residual self-attention

doi:10.11772/j.issn.1001-9081.2024040415

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1120-1129.DOI: 10.11772/j.issn.1001-9081.2024040415

• Artificial intelligence • Previous Articles Next Articles

Boundary-cross supervised semantic segmentation network with decoupled residual self-attention

Kunyuan JIANG¹, Xiaoxia LI¹^,²(), Li WANG³, Yaodan CAO³, Xiaoqiang ZHANG¹^,², Nan DING¹, Yingyue ZHOU¹^,²

^1.School of Information Engineering，Southwest University of Science and Technology，Mianyang Sichuan 621010，China
^2.Sichuan Industrial Autonomous and Controllable Artificial Intelligence Engineering Technology Research Center，Mianyang Sichuan 621010，China
^3.Mianyang 404 Hospital，Mianyang Sichuan 621000，China

Received:2024-04-11 Revised:2024-06-26 Accepted:2024-06-28 Online:2025-04-08 Published:2025-04-10
Contact: Xiaoxia LI
About author:JIANG Kunyuan， born in 2000， M. S. candidate. Her research interests include pattern recognition， medical image processing.
LI Xiaoxia， born in 1976， Ph. D.， professor. Her research interests include computer vision， pattern recognition， signal and information processing.
WANG Li， born in 1989， M. S.， attending physician. Her research interests include endoscopic diagnosis and treatment of gastrointestinal tumor.
CAO Yaodan， born in 1991， M. S.， attending physician. Her research interests include gastrointestinal tumor.
ZHANG Xiaoqiang， born in 1987， Ph. D.， lecturer. His research interests include computer vision， computational imaging.
DING Nan， born in 1999， M. S. candidate. His research interests include pattern recognition， medical image processing.
ZHOU Yingyue， born in 1983， Ph. D.， associate research fellow. Her research interests include image processing and analysis.
Supported by:
National Natural Science Foundation of China(62071399);Sichuan Science and Technology Program(2023YFG0262)

引入解耦残差自注意力的边界交叉监督语义分割网络

姜坤元¹, 李小霞¹^,²(), 王利³, 曹耀丹³, 张晓强¹^,², 丁楠¹, 周颖玥¹^,²

^1.西南科技大学信息工程学院，四川绵阳 621010
^2.四川省工业自主可控人工智能工程技术研究中心，四川绵阳 621010
^3.绵阳四〇四医院，四川绵阳 621000

通讯作者: 李小霞
作者简介:姜坤元（2000—），女，山东淄博人，硕士研究生，CCF会员，主要研究方向：模式识别、医学图像处理
李小霞（1976—），女，四川安岳人，教授，博士，主要研究方向：计算机视觉、模式识别、信号与信息处理
王利（1989—），女，四川巴中人，主治医师，硕士，主要研究方向：消化道肿瘤的内镜诊治
曹耀丹（1991—），女，四川凉山人，主治医师，硕士，主要研究方向：消化道肿瘤
张晓强（1987—），男，山东泰安人，讲师，博士，主要研究方向：计算机视觉、计算成像学
丁楠（1999—），男，河南周口人，硕士研究生，主要研究方向：模式识别、医学图像处理
周颖玥（1983—），女（藏族），四川马尔康人，副研究员，博士，主要研究方向：图像处理与分析。
基金资助:
国家自然科学基金资助项目(62071399);四川省科技计划项目(2023YFG0262)

Abstract

Abstract:

Focused on the challenges of edge information loss and incomplete segmentation of large lesions in endoscopic semantic segmentation networks， a Boundary-Cross Supervised semantic Segmentation Network （BCS-SegNet） with Decoupled Residual Self-Attention （DRA） was proposed. Firstly， DRA was introduced to enhance the network’s ability to learn distantly related lesions. Secondly， a Cross Level Fusion （CLF） module was constructed to combine multi-level feature maps within the encoding structure in a pairwise way， so as to realize the fusion of image details and semantic information at low computational cost. Finally， multi-directional and multi-scale 2D Gabor transform was utilized to extract edge information， and spatial attention was used to weight edge features in the feature maps， so as to supervise decoding process of the segmentation network， thereby providing more accurate intra-class segmentation consistency at pixel level. Experimental results demonstrate that on ISIC2018 dermoscopy and Kvasir-SEG/CVC-ClinicDB colonoscopy datasets， BCS-SegNet achieves the mIoU （mean Intersection over Union） and Dice coefficient of 84.27%， 90.68% and 79.24%， 87.91%， respectively； on the self-built esophageal endoscopy dataset， BCS-SegNet achieves the mIoU of 82.73% and Dice coefficient of 90.84%， while the above mIoU is increased by 3.30% over that of U-net and 4.97% over that of UCTransNet. It can be seen that the proposed network can realize visual effects such as more complete segmentation regions and clearer edge details.

Key words: esophageal endoscopic image, medical image segmentation, self-attention mechanism, 2D Gabor transform, multi-scale edge feature

摘要：

针对内镜语义分割网络中病灶边缘信息丢失和大面积病灶分割不全的问题，提出一种引入解耦残差自注意力（DRA）的边界交叉监督语义分割网络（BCS-SegNet）。首先，引入DRA，以增强网络对远距离关联性病灶的学习能力；其次，构建跨级交叉融合（CLF）模块，从而将编码结构中的多级特征图逐对组合，进而实现在低计算成本下图像细节与语义信息的融合；最后，使用多方向多尺度的二维Gabor变换提取边缘信息，并使用空间注意力加权特征图中的边缘特征，以监督分割网络的解码过程，从而在像素级别上提供更精准的类内分割一致性。实验结果表明，在ISIC2018皮肤镜和Kvasir-SEG/CVC-ClinicDB结肠镜数据集上，BCS-SegNet的平均交并比（mIoU）和Dice系数分别为84.27%、90.68%和79.24%、87.91%；在自建食管内镜数据集上，BCS-SegNet的mIoU和Dice系数分别为82.73%和90.84%，mIoU相较于U-net和UCTransNet分别提升了3.30%和4.97%。可见，所提网络可以达到更完整的分割区域和更清晰的边缘细节等视觉效果。

关键词: 食管内镜图像, 医学图像分割, 自注意力机制, 二维Gabor变换, 多尺度边缘特征

CLC Number:

TP391.4

Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention[J]. Journal of Computer Applications, 2025, 45(4): 1120-1129.

姜坤元, 李小霞, 王利, 曹耀丹, 张晓强, 丁楠, 周颖玥. 引入解耦残差自注意力的边界交叉监督语义分割网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1120-1129.

Figures/Tables 17

Fig. 1 Overall structure of BCS-SegNet

Fig. 2 Decoupled self-attention module

Fig. 3 Cross level fusion module

Fig. 4 Output visualization of cross level fusion module

Fig. 5 Real parts of Gabor filters with five scales and four orientations

Fig. 6 Real part responses obtained by convolving endoscopic image with 20 Gabor filters

Fig. 7 Boundary supervised decoding module

Fig. 8 Magnitude responses by convolving endoscopic image with 20 Gabor filters

Tab. 1 Number of each experimental dataset

图像类型	训练集样本数	验证集样本数	测试集样本数	总数
皮肤镜图像	2 047	260	260	2 567
结肠镜图像	1 450	162	160	1 772
食管内镜图像	2 552	310	310	3 172

Tab. 2 Comparison of results of different networks on self-built esophageal endoscopy dataset

网络类型	网络	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶	帧率/（frame·s^-1）
CNN	U-net	80.09	87.19	226.15	24.89	20
	DeepLabV3+	79.80	88.53	264.60	70.07	12
	U²-Net	74.43	83.31	150.61	43.99	18
	EGE-UNet	63.40	75.46	0.28	0.04	26
Transformer+CNN	MedT	68.55	76.92	70.89	10.80	24
	TransUnet	81.21	87.86	129.29	93.23	14
	UCTransNet	78.81	86.90	172.01	66.24	12
	BCS-SegNet	82.73	90.84	232.71	24.98	19

Tab. 3 Comparison of results of different networks on public datasets

网络类型	方法	ISIC2018		Kvasir-SEG/CVC-ClinicDB		计算量/GFLOPs	参数量/10⁶
网络类型	方法	mIoU/%	Dice/%	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶
CNN	U-net	77.28	87.15	78.48	87.63	226.15	24.89
	UPerNet	77.37	89.39	75.44	84.18	30.73	27.39
	UNeXt	72.27	82.52	75.96	86.19	2.30	1.47
	EGE-UNet	80.12	88.96	68.97	81.64	0.28	0.04
Transformer+CNN	MedT	71.19	82.41	76.79	84.32	70.89	10.80
Transformer+CNN	BCS-SegNet	84.27	90.68	79.24	87.91	232.70	24.98

Fig. 9 Comparison experimental visualized results on self-built esophageal endoscopy dataset

Fig. 10 Comparison experimental visualized results on public datasets

Tab. 4 Ablation experimental results for each module

实验编号	网络	mIoU/%	Dice/%
1	U-net	80.09	87.19
2	+DRA	81.33	88.24
3	+CLF	80.61	88.12
4	+BSD	82.70	90.72
5	+DRA、CLF	81.52	88.31
6	+DRA、CLF、BSD（BCS-SegNet）	82.73	90.84

Fig. 11 Visual comparison of ablation experimental results

Tab. 5 Ablation experimental results of decoupled residual self-attention

f₁	f₂	f₃	f₄	f₅	mIoU/%	Dice/%	计算量/GFLOPs	参数量/10⁶
√					81.47	88.36	235.48	24.91
	√				81.33	88.24	232.30	24.98
		√			80.19	87.35	231.76	25.23
			√		80.16	86.96	231.42	26.21
				√	79.44	85.87	227.52	26.21

Tab. 6 Ablation experimental results of cross level fusion

编号	融合策略	mIoU/%	Dice/%
1	融合 $(f 1, f 4) (f 2, f 4) (f 3, f 4)$	80.61	88.12
2	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5)$	80.38	87.54
3	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5) (f 4, f 5)$	79.43	86.39

Tab. 6 Ablation experimental results of cross level fusion

编号	融合策略	mIoU/%	Dice/%
1	融合 $(f 1, f 4) (f 2, f 4) (f 3, f 4)$	80.61	88.12
2	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5)$	80.38	87.54
3	融合 $(f 1, f 5) (f 2, f 5) (f 3, f 5) (f 4, f 5)$	79.43	86.39

References 32

1	董育宁，刘天亮，戴修斌，等. 医学图像处理理论与应用［M］. 南京：东南大学出版社， 2020： 44-54.
	DONG Y N， LIU T L， DAI X B， et al. Medical image processing theory and applications［M］. Nanjing： Southeast University Press， 2020： 44-54.
2	窦猛，陈哲彬，王辛，等. 基于深度学习的多模态医学图像分割综述［J］. 计算机应用， 2023， 43（11）：3385-3395.
	DOU M， CHEN Z B， WANG X， et al. Review of multi-modal medical image segmentation based on deep learning［J］. Journal of Computer Applications， 2023， 43（11）：3385-3395.
3	GONZALEZ R C， WOODS R E. 数字图像处理（第四版）［M］. 阮秋琦，阮宇智，译. 北京：电子工业出版社， 2020： 504-572.
	GONZALEZ R C， WOODS R E. Digital image processing， fourth edition［M］. RUAN Q Q， RUAN Y Z， translated. Beijing： Publishing House of Electronics Industry， 2020： 504-572.
4	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
5	ASGARI TAGHANAKI S， ABHISHEK K， COHEN J P， et al. Deep semantic segmentation of natural and medical images： a review［J］. Artificial Intelligence Review， 2021， 54： 137-178.
6	CHEN L C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 833-851.
7	XIAO T， LIU Y， ZHOU B， et al. Unified perceptual parsing for scene understanding［C］// Proceedings of 2018 European Conference on Computer Vision， LNCS 11209. Cham： Springer， 2018： 432-448.
8	QIN X B， ZHANG Z C， HUANG C Y， et al. U²-Net： going deeper with nested U-structure for salient object detection［J］. Pattern Recognition， 2020， 106： No.107404.
9	VALANARASE J M J， PATEL V M. UNeXt： MLP-based rapid medical image segmentation network［C］// Proceedings of the 2022 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 13435. Cham： Springer， 2022： 23-33.
10	D’ASCOLI S， TOUVRON H， LEAVITT M L， et al. ConViT： improving vision Transformers with soft convolutional inductive biases［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 2286-2296.
11	AZAD R， KAZEROUNI A， HEIDARI M， et al. Advances in medical image analysis with vision Transformers： a comprehensive review［J］. Medical Image Analysis， 2024， 91： No.103000.
12	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation［EB/OL］. ［2023-11-21］..
13	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
14	WANG H， CAO P， WANG J， et al. UCTransnet： rethinking the skip connections in U-Net from a channel-wise perspective with Transformer［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 2441-2449.
15	ZHANG Y， LIU H Y， HU Q. TransFuse： fusing Transformers and CNNs for medical image segmentation［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 14-24.
16	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
17	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
18	VALANARASU J M J， OZA P， HACIHALILOGLU I， et al. Medical Transformer： gated axial-attention for medical image segmentation［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 36-46.
19	RUAN J， XIE M， GAO J， et al. EGE-UNet： an efficient group enhanced UNet for skin lesion segmentation［C］// Proceedings of the 2023 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 14223. Cham： Springer， 2023： 481-490.
20	SANDLER M， HOWARD A， ZHU M， et al. MobileNetV2： inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520.
21	KOKKINOS I. Pushing the boundaries of boundary detection using deep learning［EB/OL］. ［2023-11-21］..
22	NONG Z， SU X， LIU Y， et al. Boundary-aware dual-stream network for VHR remote sensing images semantic segmentation［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2021， 14： 5260-5268.
23	CONG R， YANG H， JIANG Q， et al. BCS-Net： boundary， context， and semantic for automatic COVID-19 lung infection segmentation from CT images［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： No.5019011.
24	CHEN F L， LIU H L， ZENG Z H， et al. BES-Net： boundary enhancing semantic context network for high-resolution image semantic segmentation［J］. Remote Sensing， 2022， 14（7）： No.1638.
25	LIN Y， ZHANG D， FANG X， et al. Rethinking boundary detection in deep learning models for medical image segmentation［C］// Proceedings of the 2023 International Conference on Information Processing in Medical Imaging， LNCS 13939. Cham： Springer， 2023： 730-742.
26	傅励瑶，尹梦晓，杨锋. 基于Transformer的U型医学图像分割网络综述［J］. 计算机应用， 2023， 43（5）：1584-1595.
	FU L Y， YIN M X， YANG F. Transformer based U-shaped medical image segmentation network： a survey［J］. Journal of Computer Applications， 2023， 43（5）：1584-1595.
27	朱希安，曹林. 小波分析及其在数字图像处理中的应用［M］. 北京：电子工业出版社， 2012： 163-169， 213-221.
	ZHU X A， CAO L. Wavelet analysis and its application in digital image processing［M］. Beijing： Publishing House of Electronics Industry， 2012： 163-169， 213-221.
28	MILLETARI F， NAVAB N， AHMADI S A. V-Net： fully convolutional neural networks for volumetric medical image segmentation［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 565-571.
29	CODELLA N， ROTEMBERG V， TSCHANDL P， et al. Skin lesion analysis toward melanoma detection 2018： a challenge hosted by the International Skin Imaging Collaboration （ISIC）［EB/OL］. ［2023-12-02］..
30	JHA D， SMEDSRUD P H， RIEGLER M A， et al. Kvasir-SEG： a segmented polyp dataset［C］// Proceedings of the 2020 International Conference on Multimedia Modeling， LNCS 11962. Cham： Springer， 2020： 451-462.
31	TAJBAKHSH N， GURUDU S R， LIANG J. Automated polyp detection in colonoscopy videos using shape and context information［J］. IEEE Transactions on Medical Imaging， 2016， 35（2）： 630-644.
32	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. ［2023-12-02］..

[1]	Baohua YUAN, Jialu CHEN, Huan WANG. Medical image segmentation network integrating multi-scale semantics and parallel double-branch [J]. Journal of Computer Applications, 2025, 45(3): 988-995.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[5]	Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977.
[6]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
[7]	Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet： MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302.
[8]	Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392.
[9]	Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384.
[10]	Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418.
[11]	Cong GU, Qiqiang DUAN, Siyu REN. Polyp segmentation algorithm based on context-aware network [J]. Journal of Computer Applications, 2024, 44(11): 3617-3622.
[12]	Xingyao YANG, Hongtao SHEN, Zulian ZHANG, Jiong YU, Jiaying CHEN, Dongxiao WANG. Sequential recommendation based on hierarchical filter and temporal convolution enhanced self-attention network [J]. Journal of Computer Applications, 2024, 44(10): 3090-3096.
[13]	Yanbo LI, Qing HE, Shunyi LU. Aspect sentiment triplet extraction integrating semantic and syntactic information [J]. Journal of Computer Applications, 2024, 44(10): 3275-3280.
[14]	Hanxiao SHI, Leichun WANG. Short-term power load forecasting by graph convolutional network combining LSTM and self-attention mechanism [J]. Journal of Computer Applications, 2024, 44(1): 311-317.
[15]	Li’an CHEN, Yi GUO. Text sentiment analysis model based on individual bias information [J]. Journal of Computer Applications, 2024, 44(1): 145-151.

Boundary-cross supervised semantic segmentation network with decoupled residual self-attention

引入解耦残差自注意力的边界交叉监督语义分割网络

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 32

Related Articles 15

Recommended Articles

Metrics