融合多尺度语义和双分支并行的医学图像分割网络

doi:10.11772/j.issn.1001-9081.2024030358

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (3): 988-995.DOI: 10.11772/j.issn.1001-9081.2024030358

• 多媒体计算与计算机仿真 • 上一篇下一篇

融合多尺度语义和双分支并行的医学图像分割网络

袁宝华¹(), 陈佳璐¹, 王欢²

^1.常州大学计算机与人工智能学院，江苏常州 213159
^2.南京理工大学计算机科学与工程学院，南京 210094

收稿日期:2024-04-01 修回日期:2024-05-26 接受日期:2024-05-29 发布日期:2024-06-17 出版日期:2025-03-10
通讯作者: 袁宝华
作者简介:陈佳璐（1997—），男，湖南娄底人，硕士研究生，CCF会员，主要研究方向：医学图像分割
王欢（1982—），男，江苏镇江人，副教授，博士，主要研究方向：计算机视觉。
基金资助:
国家自然科学基金资助项目(61703209)

Medical image segmentation network integrating multi-scale semantics and parallel double-branch

Baohua YUAN¹(), Jialu CHEN¹, Huan WANG²

^1.School of Computer Science and Artificial Intelligence，Changzhou University，Changzhou Jiangsu 213159，China
^2.School of Computer Science and Engineering，Nanjing University of Science and Technology，Nanjing Jiangsu 210094，China

Received:2024-04-01 Revised:2024-05-26 Accepted:2024-05-29 Online:2024-06-17 Published:2025-03-10
Contact: Baohua YUAN
About author:CHEN Jialu， born in 1997， M. S. candidate. His research interests include medical image segmentation.
WANG Huan， born in 1982， Ph. D.， associate professor. His research interests include computer vision.
Supported by:
National Natural Science Foundation of China(61703209)

摘要/Abstract

摘要：

在医学图像分割网络中，卷积神经网络（CNN）虽然能提取丰富的局部特征细节，但存在远程信息捕获不足的问题。Transformer虽然可以捕捉长距离的全局特征依赖关系，但是会破坏局部特征细节。为充分利用2种网络特征的互补性，提出一种用于医学图像分割的CNN和Transformer并行的融合网络——PFNet。该网络的并行融合模块使用一对基于CNN和Transformer的相互依赖的并行分支来高效地学习局部和全局两方面的辨别特征，并以交互方式交叉融合局部特征和长距离特征的依赖关系；同时，为恢复在下采样期间丢失的空间信息以增强细节的保留，提出多尺度交互（MSI）模块提取分层CNN分支生成的多尺度特征的局部上下文以进行远程依赖关系建模。实验结果表明，PFNet优于MISSFormer（Medical Image Segmentation tranSFormer）和UCTransNet（U-Net with Channel Transformer module）等先进方法。在Synapse和ACDC（Automated Cardiac Diagnosis Challenge）数据集上，相较于最优的基线方法MISSFormer，PFNet的平均Dice相似系数（DSC）分别提高1.27%和0.81%。可见，PFNet能实现更精准的医学图像分割。

关键词: 医学图像分割, Transformer, 卷积神经网络, 并行融合, 多尺度交互

Abstract:

In medical image segmentation networks， Convolutional Neural Network （CNN） can extract rich local feature details， but has the problem of insufficient capture of long-range information， and Transformer can capture long-range global feature dependencies， but destroys local feature details. To make full use of the complementarity of characteristics of the two networks， a parallel fusion network of CNN and Transformer for medical image segmentation was proposed， named PFNet. In the parallel fusion module of this network， a pair of interdependent parallel branches based on CNN and Transformer were used to learn both local and global discriminative features efficiently， and fuse local features and long-distance feature dependencies interactively. At the same time， to recover the spatial information lost during downsampling to enhance detail retention， a Multi-Scale Interaction （MSI） module was proposed to extract the local context of multi-scale features generated by hierarchical CNN branches for long-range dependency modeling. Experimental results show that PFNet outperforms other advanced methods such as MISSFormer （Medical Image Segmentation tranSFormer） and UCTransNet （U-Net with Channel Transformer module）. On Synapse and ACDC （Automated Cardiac Diagnosis Challenge） datasets， compared to the optimal baseline method MISSFormer， PFNet increases the average Dice Similarity Coefficient （DSC） by 1.27% and 0.81%， respectively. It can be seen that PFNet can realize more accurate medical image segmentation.

Key words: medical image segmentation, Transformer, Convolutional Neural Network (CNN), parallel fusion, Multi-Scale Interaction (MSI)

中图分类号:

TP391.4

袁宝华, 陈佳璐, 王欢. 融合多尺度语义和双分支并行的医学图像分割网络[J]. 计算机应用, 2025, 45(3): 988-995.

Baohua YUAN, Jialu CHEN, Huan WANG. Medical image segmentation network integrating multi-scale semantics and parallel double-branch[J]. Journal of Computer Applications, 2025, 45(3): 988-995.

图/表 10

参考文献 37

1	WANG R， LEI T， CUI R， et al. Medical image segmentation using deep learning： a survey ［J］. IET Image Processing， 2022， 16（5）： 1243-1267.
2	JIA Y， KAUL C， LAWTON T， et al. Prediction of weaning from mechanical ventilation using convolutional neural networks ［J］. Artificial Intelligence in Medicine， 2021， 117： No.102087.
3	RONNEBERGER O， FISCHER P， BROX T. U-Net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
4	周涛，董雅丽，霍兵强，等. U-Net网络医学图像分割应用综述［J］. 中国图象图形学报， 2021， 26（9）： 2058-2077.
	ZHOU T， DONG Y L， HUO B Q， et al. U-Net and its applications in medical image segmentation： a review ［J］. Journal of Image and Graphics， 2021， 26（9）： 2058-2077.
5	MATSOUKAS C， HASLUM J F， SÖDERBERG M， et al. Is it time to replace CNNs with transformers for medical images？［EB/OL］. ［2021-08-20］. .
6	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
7	傅励瑶，尹梦晓，杨锋. 基于Transformer的U型医学图像分割网络综述［J］. 计算机应用， 2023， 43（5）： 1584-1595.
	FU L Y， YIN M X， YANG F. Transformer based U-shaped medical image segmentation network： a survey ［J］. Journal of Computer Applications， 2023， 43（5）： 1584-1595.
8	DOSOVISKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2024-06-03］. .
9	CHEN H， WANG Y， GUO T， et al. Pre-trained image processing Transformer ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 12294-12305.
10	LIN K， HECKKEL R. Vision Transformers enable fast and robust accelerated MRI ［C］// Proceedings of the 5th International Conference on Medical Imaging with Deep Learning. New York： JMLR.org， 2022： 774-795.
11	SHAMSHAD F， KHAN S， ZAMIR S W， et al. Transformers in medical imaging： a survey ［J］. Medical Image Analysis， 2023， 88： No.102802.
12	常州大学. 一种基于CNN-Transformer并行融合方法： 202310041351.6［P］. 2023-05-02.
	Changzhou University. A parallel fusion method based on CNN-Transformer： 202310041351.6 ［P］. 2023-05-02.
13	ZHOU Z， SIDDIQUEE M M R， TAJBAKHSH N， et al. UNet++： redesigning skip connections to exploit multiscale features in image segmentation ［J］. IEEE Transactions on Medical Imaging， 2020， 39（6）： 1856-1867.
14	WANG Z H， LIU Z， SONG Y Q， et al. Densely connected deep U-Net for abdominal multi-organ segmentation ［C］// Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway： IEEE， 2019： 1415-1419.
15	SCHLEMPER J， OKTAY O， SCHAAP M， et al. Attention gated networks： learning to leverage salient regions in medical images［J］. Medical Image Analysis， 2019， 53： 197-207.
16	LV Y， MA H， LI J， et al. Attention guided U-Net with atrous convolution for accurate retinal vessels segmentation ［J］. IEEE Access， 2020， 8： 32826-32839.
17	HAN Z， JIAN M， WANG G G. ConvUNeXt： an efficient convolution neural network for medical image segmentation ［J］. Knowledge-Based Systems， 2022， 253： No.109512.
18	SRINIVAS A， LIN T Y， PARMAR N， et al. Bottleneck Transformers for visual recognition ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 16514-16524.
19	STRUDEL R， GARCIA R， LAPTEV I， et al. Segmenter： Transformer for semantic segmentation ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 7242-7252.
20	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation ［EB/OL］. ［2024-02-08］. .
21	YAO C， HU M， LI Q， et al. TransClaw U-Net： Claw U-Net with Transformers for medical image segmentation ［C］// Proceedings of the 5th International Conference on Information Communication and Signal Processing. Piscataway： IEEE， 2022： 280-284.
22	HUANG X， DENG Z， LI D， et al. MISSFormer： an effective transformer for 2D medical image segmentation ［J］. IEEE Transactions on Medical Imaging， 2023， 42（5）： 1484-1494.
23	CHEN B， LIU Y， ZHANG Z， et al. TransAttUNet： multi-level attention-guided U-Net with Transformer for medical image segmentation ［J］. IEEE Transactions on Emerging Topics in Computational Intelligence， 2024， 8（1）： 55-68.
24	JIANG M， YUAN B， KOU W， et al. Prostate cancer segmentation from MRI by a multistream fusion encoder ［J］. Medical Physics， 2023， 50（9）： 5489-5504.
25	XU G， ZHANG X， HE X， et al. LeViT-UNet： make faster encoders with Transformer for medical image segmentation ［C］// Proceedings of the 2023 Chinese Conference on Pattern Recognition and Computer Vision， LNCS 14432. Singapore： Springer， 2024： 42-53.
26	WANG H， CAO P， WANG J， et al. UCTransNet： rethinking the skip connections in U-Net from a channel-wise perspective with Transformer ［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 2441-2449.
27	HATAMIZADEH A， TANG Y， NATH V， et al. UNETR： Transformers for 3D medical image segmentation ［C］// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2022： 1748-1758.
28	WANG H， XIE S， LIN L， et al. Mixed Transformer U-Net for medical image segmentation ［C］// Proceedings of the 2022 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2022： 2390-2394.
29	CAO H， WANG Y， CHEN J， et al. Swin-UNet： UNet-like pure Transformer for medical image segmentation ［C］// Proceedings of the 2022 European Conference on Computer Vision Workshops， LNCS 13803. Cham： Springer， 2023： 205-218.
30	LIN A， CHEN B， XU J， et al. DS-TransUNet： dual Swin Transformer U-Net for medical image segmentation ［J］. IEEE Transactions on Instrumentation and Measurement， 2022， 71： No.4005615.
31	VALANARASU J M J， OZA P， HACIHALILOGLU I， et al. Medical Transformer： gated axial-attention for medical image segmentation ［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12901. Cham： Springer， 2021： 36-46.
32	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
33	LANDMAN B， XU Z， IGELSIAS J， et al. MICCAI multi-atlas labeling beyond the cranial vault-workshop and challenge ［EB/OL］. ［2023-10-23］. .
34	BERNARD O， LALANDE A， ZOTTI C， et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis： is the problem solved？［J］. IEEE Transactions on Medical Imaging， 2018， 37（11）： 2514-2525.
35	MILLETARI F， NAVAB N， AHMADI S A. V-Net： fully convolutional neural networks for volumetric medical image segmentation ［C］// Proceedings of the 4th International Conference on 3D Vision. Piscataway： IEEE， 2016： 565-571.
36	FU S， LU Y， WANG Y， et al. Domain adaptive relational reasoning for 3D multi-organ segmentation ［C］// Proceedings of the 2020 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 12261. Cham： Springer， 2020： 656-666.
37	YUAN F， ZHANG Z， FANG Z. An effective CNN and Transformer complementary network for medical image segmentation ［J］. Pattern Recognition， 2023， 136： No.109228.

模型	平均DSC/%	平均HD/mm	不同类别的DSC/%
模型	平均DSC/%	平均HD/mm	主动脉	胆囊	左肾	右肾	肝	胰	脾	胃
V-Net^［35］	68.81	—	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98
DARR^［36］	69.77	—	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96
U-Net^［3］	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58
R50 UNet^［3］	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16
R50 AttnUNet^［16］	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95
AttnUNet^［16］	77.77	36.02	89.55	68.88	77.98	71.11	93.57	58.04	87.30	75.75
R50 ViT^［6］	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95
TransUNet^［20］	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62
TransClaw U-Net^［21］	78.09	26.38	85.87	61.38	84.83	79.36	94.28	57.65	87.74	73.55
LeViT-UNet-384^［25］	78.53	16.84	87.33	62.23	84.61	80.25	93.11	59.07	88.86	72.76
MT-UNet^［28］	78.59	26.59	87.92	64.99	81.47	77.29	93.06	59.46	87.75	76.81
Swin-UNet^［29］	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60
UCTransNet^［26］	78.23	26.75	88.86	66.97	80.19	73.18	93.17	56.22	87.84	79.42
MISSFormer^［22］	81.96	19.65	86.99	68.65	85.21	82.00	94.41	65.67	91.92	80.81
CTC-Net^［37］	78.41	22.52	80.46	63.53	83.71	80.79	93.78	59.73	86.87	72.39
PFNet	83.00	13.17	88.43	69.75	88.59	84.92	95.52	65.27	91.57	79.98

模型	平均DSC/%	平均HD/mm	不同类别的DSC/%
模型	平均DSC/%	平均HD/mm	主动脉	胆囊	左肾	右肾	肝	胰	脾	胃
V-Net^［35］	68.81	—	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98
DARR^［36］	69.77	—	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96
U-Net^［3］	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58
R50 UNet^［3］	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16
R50 AttnUNet^［16］	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95
AttnUNet^［16］	77.77	36.02	89.55	68.88	77.98	71.11	93.57	58.04	87.30	75.75
R50 ViT^［6］	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95
TransUNet^［20］	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62
TransClaw U-Net^［21］	78.09	26.38	85.87	61.38	84.83	79.36	94.28	57.65	87.74	73.55
LeViT-UNet-384^［25］	78.53	16.84	87.33	62.23	84.61	80.25	93.11	59.07	88.86	72.76
MT-UNet^［28］	78.59	26.59	87.92	64.99	81.47	77.29	93.06	59.46	87.75	76.81
Swin-UNet^［29］	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60
UCTransNet^［26］	78.23	26.75	88.86	66.97	80.19	73.18	93.17	56.22	87.84	79.42
MISSFormer^［22］	81.96	19.65	86.99	68.65	85.21	82.00	94.41	65.67	91.92	80.81
CTC-Net^［37］	78.41	22.52	80.46	63.53	83.71	80.79	93.78	59.73	86.87	72.39
PFNet	83.00	13.17	88.43	69.75	88.59	84.92	95.52	65.27	91.57	79.98

模型	平均DSC	不同类别的DSC
模型	平均DSC	右心室	心肌	左心室
R50 UNet^［3］	87.55	87.10	80.63	94.92
R50 AttnUNet^［16］	86.75	87.58	79.20	93.47
R50 ViT^［6］	87.57	86.07	81.88	94.75
TransUNet^［20］	89.71	88.86	84.53	95.73
TransClaw U-Net^［21］	83.13	84.89	77.26	87.24
Swin-UNet^［29］	90.00	88.55	85.62	95.83
LeViT-UNet-384^［25］	90.32	89.55	87.64	93.76
MT-UNet^［28］	90.43	86.64	89.04	95.62
UCTransNet^［26］	89.69	87.92	85.43	95.71
MISSFormer^［22］	90.86	89.55	88.04	94.99
CTC-Net^［37］	90.77	90.99	85.52	96.72
PFNet	91.60	90.25	89.27	95.29

模型	平均DSC	不同类别的DSC
模型	平均DSC	右心室	心肌	左心室
R50 UNet^［3］	87.55	87.10	80.63	94.92
R50 AttnUNet^［16］	86.75	87.58	79.20	93.47
R50 ViT^［6］	87.57	86.07	81.88	94.75
TransUNet^［20］	89.71	88.86	84.53	95.73
TransClaw U-Net^［21］	83.13	84.89	77.26	87.24
Swin-UNet^［29］	90.00	88.55	85.62	95.83
LeViT-UNet-384^［25］	90.32	89.55	87.64	93.76
MT-UNet^［28］	90.43	86.64	89.04	95.62
UCTransNet^［26］	89.69	87.92	85.43	95.71
MISSFormer^［22］	90.86	89.55	88.04	94.99
CTC-Net^［37］	90.77	90.99	85.52	96.72
PFNet	91.60	90.25	89.27	95.29

模型	参数量/10⁶	浮点运算量/GFLOPs
U-Net^［3］	31.13	55.84
Swin-UNet^［29］	96.34	42.68
TransUNet^［20］	105.32	38.52
PFNet	139.65	78.32

融合多尺度语义和双分支并行的医学图像分割网络

Medical image segmentation network integrating multi-scale semantics and parallel double-branch

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 37

相关文章 15

编辑推荐

Metrics

模型	平均DSC/%	平均HD/mm
基础模型（只添加SEConv）	80.20	19.33
添加PFM	81.59	14.47
添加MSI	82.00	13.99
PFNet	83.00	13.17

模型		平均DSC/%	平均HD/mm
不同分支	删除Transformer分支	80.20	19.33
不同分支	删除CNN分支	79.93	18.44
不同融合方式	删除DFE	81.11	18.26
	删除GCF	79.57	17.65
	删除SFI	80.37	22.40
	自注意力机制替换GCF	80.89	21.45
	添加PFM	81.59	14.47

模型	平均DSC/%	平均HD/mm
（2，3）	81.94	19.35
（1，2），（2，3）	81.96	18.78
（2，3），（1，2）	82.30	19.85
（1，2），（1，3）	82.59	17.96
（1，3），（1，2）	81.95	18.92
（2，3），（1，2）	82.50	14.61
（1，3），（2，3）	82.21	15.84
（1，2），（1，3），（2，3）	82.26	17.85
（1，3），（2，3），（1，2）	82.77	17.91
（1，3），（1，2），（2，3）	82.45	17.19
（2，3），（1，3），（1，2）	82.92	16.60
（2，3），（1，2），（1，3）	82.52	16.21
（1，2），（2，3），（1，3）	83.00	13.17

[1]	耿海军, 董赟, 胡治国, 池浩田, 杨静, 尹霞. 基于Attention-1DCNN-CE的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 872-882.
[2]	王雅伦, 张仰森, 朱思文. 面向知识推理的位置编码标题生成模型[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 345-353.
[3]	张翰林, 王俊陆, 宋宝燕. 融合衍生特征的时间序列事件分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 428-435.
[4]	王地欣, 王佳昊, 李敏, 陈浩, 胡光耀, 龚宇. 面向水声通信网络的异常攻击检测[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 526-533.
[5]	徐欣然, 张绍兵, 成苗, 张洋, 曾尚. 基于多路层次化混合专家模型的轴承故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 59-68.
[6]	梁杰涛, 罗兵, 付兰慧, 常青玲, 李楠楠, 易宁波, 冯其, 何鑫, 邓辅秦. 基于坐标几何采样的点云配准方法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 214-222.
[7]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[8]	方介泼, 陶重犇. 应对零日攻击的混合车联网入侵检测系统[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2763-2769.
[9]	贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902.
[10]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[11]	杨鑫, 陈雪妮, 吴春江, 周世杰. 结合变种残差模型和Transformer的城市公路短时交通流预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2947-2951.
[12]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[13]	李金金, 桑国明, 张益嘉. APK-CNN和Transformer增强的多域虚假新闻检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2674-2682.
[14]	任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746.
[15]	丁宇伟, 石洪波, 李杰, 梁敏. 基于局部和全局特征解耦的图像去噪网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2571-2579.