Semantic segmentation method for remote sensing images based on multi-scale feature fusion

doi:10.11772/j.issn.1001-9081.2023040439

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 737-744.DOI: 10.11772/j.issn.1001-9081.2023040439

• Artificial intelligence • Previous Articles Next Articles

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

Ning WU¹^,², Yangyang LUO¹, Huajie XU¹^,³()

^1.School of Computer，Electronics and Information，Guangxi University，Nanning Guangxi 530004，China
^2.Guangxi Key Laboratory of Marine Engineering Equipment and Technology （Beibu Gulf University），Qinzhou Guangxi 535011，China
^3.Guangxi Key Laboratory of Multimedia Communications and Network Technology （Guangxi University），Nanning Guangxi 530004，China

Received:2023-04-18 Revised:2023-06-26 Accepted:2023-06-30 Online:2023-12-04 Published:2024-03-10
Contact: Huajie XU
About author:WU Ning， born in 1980， Ph. D.， research fellow. His research interests include image processing， pattern recognition， machine vision.
LUO Yangyang， born in 1998， M. S. candidate. Her research interests include semantic segmentation，deep learning.
Supported by:
Science and Technology Plan Project of Chongzuo(FB2018001)

基于多尺度特征融合的遥感图像语义分割方法

吴宁¹^,², 罗杨洋¹, 许华杰¹^,³()

^1.广西大学计算机与电子信息学院, 南宁 530004
^2.广西海洋工程装备与技术重点实验室(北部湾大学), 广西钦州 535011
^3.广西多媒体通信与网络技术重点实验室(广西大学), 南宁 530004

通讯作者: 许华杰
作者简介:吴宁（1980—），男，广西贵港人，研究员，博士，主要研究方向：图像处理、模式识别、机器视觉
罗杨洋（1998—），女（壮族），广西田阳人，硕士研究生，主要研究方向：语义分割、深度学习；
基金资助:
崇左市科技计划项目(FB2018001)

Abstract

Abstract:

To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network （DCNN）， a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly， an Attention Enhancement Module （AEM） was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly， the Feature Pyramid Network （FPN） was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally， the Atrous Spatial Pyramid Pooling （ASPP） module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy （mPA） and mean Intersection over Union （mIoU） of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method， and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally， the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou， Guangxi， and achieved Pixel Accuracy （PA） and Intersection over Union （IoU） of 96.21% and 91.70%， respectively.

Key words: remote sensing image, semantic segmentation, multi-scale, feature fusion, Swin Transformer

摘要：

为提高遥感图像语义分割精度，解决深度卷积神经网络（DCNN）特征提取过程中小尺寸目标信息丢失的问题，提出一种基于多尺度特征融合的语义分割方法FuseSwin。首先，在Swin Transformer中引入注意力增强模块（AEM），以突出目标所在区域并抑制背景噪声的干扰；其次，利用特征金字塔网络（FPN）融合多尺度特征的细节信息和高级语义信息，以补充目标的特征；最后，通过空洞空间金字塔池化（ASPP）模块从融合特征图中进一步捕获目标的上下文信息，提升模型分割精度。实验结果表明，所提方法在Potsdam遥感数据集上的平均像素准确率（mPA）和平均交并比（mIoU），与DeepLabV3方法相比，分别提高了2.34、3.23个百分点；与SegFormer方法相比，分别提高了1.28、1.75个百分点，优于目前主流的分割方法。此外，将所提方法实际应用于广西钦州茅尾海的高分辨率遥感图像中的蚝排识别与分割，分别取得96.21%、91.70%的像素准确率（PA）和交并比（IoU）。

关键词: 遥感图像, 语义分割, 多尺度, 特征融合, Swin Transformer

CLC Number:

TP751.1

Ning WU, Yangyang LUO, Huajie XU. Semantic segmentation method for remote sensing images based on multi-scale feature fusion[J]. Journal of Computer Applications, 2024, 44(3): 737-744.

吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.

Figures/Tables 11

References 24

1	KOTARIDIS I， LAZARIDOU M. Remote sensing image segmentation advances： a meta-analysis ［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2021， 173： 309-322. 10.1016/j.isprsjprs.2021.01.020
2	DOSOViTSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-05-22］. .
3	ALEISSAEE A A， KUMAR A， ANWER R M， et al. Transformers in remote sensing： a survey ［EB/OL］. ［2023-02-11］. . 10.3390/rs15071860
4	NASEER M， RANASINGHE K， KHAN S H， et al. Intriguing properties of vision Transformers ［J］. Advances in Neural Information Processing Systems， 2021， 34： 23296-23308.
5	傅励瑶，尹梦晓，杨锋.基于Transformer的U型医学图像分割网络综述［J］.计算机应用，2023，43（5）：1584-1595.
	FU L Y， YIN M X， YANG F. Transformer based U-shaped medical image segmentation network： a survey ［J］. Journal of Computer Applications， 2023， 43（5）： 1584-1595.
6	王利，宣士斌，秦续阳，等.基于双解码器的Transformer多目标跟踪方法［J］.计算机应用，2023， 43（6）： 1919-1929.
	WANG L， XUAN S B， QIN X Y， et al. Multi-object tracking method based on dual-decoder Transformer ［J］. Journal of Computer Applications， 2023， 43（6）： 1919-1929.
7	XU Z， ZHANG W， ZHANG T， et al. Efficient Transformer for remote sensing image segmentation ［J］. Remote Sensing， 2021， 13（18）： 3585. 10.3390/rs13183585
8	YUAN X， SHI J， GU L. A review of deep learning methods for semantic segmentation of remote sensing imagery ［J］. Expert Systems with Applications， 2021， 169： 114417. 10.1016/j.eswa.2020.114417
9	ZHAO T， XU J， CHEN R， et al. Remote sensing image segmentation based on the fuzzy deep convolutional neural network［J］. International Journal of Remote Sensing， 2021， 42（16）： 6264-6283. 10.1080/01431161.2021.1938738
10	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
11	KIRILLOV A， GIRSHICK R， HE K， et al. Panoptic feature pyramid networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 6392-6401. 10.1109/cvpr.2019.00656
12	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
13	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
14	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network ［C］//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239. 10.1109/cvpr.2017.660
15	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2023-01-10］. . 10.1109/tpami.2017.2699184
16	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848. 10.1109/tpami.2017.2699184
17	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］.［2023-01-10］. . 10.1007/978-3-030-01234-2_49
18	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 833-851. 10.1007/978-3-030-01234-2_49
19	ZHENG S， LU J， ZHAO H， et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 6877-6886. 10.1109/cvpr46437.2021.00681
20	STRUDEL R， GARCIA R， LAPTEV I， et al. Segmenter： Transformer for semantic segmentation ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 7242-7252. 10.1109/iccv48922.2021.00717
21	XIE E， WANG W， YU Z， et al. SegFormer： simple and efficient design for semantic segmentation with Transformers ［J］. Advances in Neural Information Processing Systems， 2021， 34： 12077-12090.
22	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： hierarchical vision Transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
23	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	International Society for Photogrammetry and Remote Sensing. 2D semantic labeling contest — Potsdam ［DB/OL］. ［2023-06-21］..

方法类别	方法名称	不同类别的IoU/%					参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	不透水表面	建筑物	低植被	树木	汽车	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	PSPNet^［14］	85.56	94.06	77.88	78.31	76.63	46.60	5.32	89.92	82.49
	FCN^［12］	85.33	94.23	77.42	77.31	78.22	47.13	5.49	89.96	82.51
	DeepLabV3^［17］	85.56	94.04	77.79	78.35	78.46	65.74	6.36	90.69	82.84
Transformer-base	SETR^［19］	82.03	93.98	76.72	77.62	77.43	310.65	40.66	88.43	81.56
	Segmenter^［20］	83.19	93.92	77.80	78.76	78.92	102.39	13.42	90.10	82.52
	SegFormer^［21］	85.61	92.09	78.07	76.80	89.01	3.72	1.22	91.75	84.32
	FuseSwin	87.00	94.00	79.56	78.86	90.93	56.94	73.98	93.03	86.07

方法类别	方法名称	不同类别的IoU/%					参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	不透水表面	建筑物	低植被	树木	汽车	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	PSPNet^［14］	85.56	94.06	77.88	78.31	76.63	46.60	5.32	89.92	82.49
	FCN^［12］	85.33	94.23	77.42	77.31	78.22	47.13	5.49	89.96	82.51
	DeepLabV3^［17］	85.56	94.04	77.79	78.35	78.46	65.74	6.36	90.69	82.84
Transformer-base	SETR^［19］	82.03	93.98	76.72	77.62	77.43	310.65	40.66	88.43	81.56
	Segmenter^［20］	83.19	93.92	77.80	78.76	78.92	102.39	13.42	90.10	82.52
	SegFormer^［21］	85.61	92.09	78.07	76.80	89.01	3.72	1.22	91.75	84.32
	FuseSwin	87.00	94.00	79.56	78.86	90.93	56.94	73.98	93.03	86.07

方法类别	方法名称	PA/%		IoU/%		参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	蚝排	陆地	蚝排	陆地	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	FCN^［12］	84.32	98.23	70.56	95.84	47.13	5.49	91.28	83.20
	PSPNet^［14］	82.63	97.94	69.85	96.94	46.60	5.32	90.29	83.40
	DeepLabV3^［17］	84.45	94.68	82.12	92.13	65.74	6.36	89.57	87.13
Transformer-base	SETR^［19］	86.85	95.20	72.37	95.59	310.65	40.66	91.03	83.98
	Segmenter^［20］	90.64	97.31	81.56	93.20	102.39	13.42	93.98	87.38
	SegFormer^［21］	91.86	95.19	88.76	92.74	3.72	1.22	93.53	90.75
	FuseSwin	96.21	98.11	91.70	96.34	56.94	73.98	97.16	94.02

方法类别	方法名称	PA/%		IoU/%		参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	蚝排	陆地	蚝排	陆地	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	FCN^［12］	84.32	98.23	70.56	95.84	47.13	5.49	91.28	83.20
	PSPNet^［14］	82.63	97.94	69.85	96.94	46.60	5.32	90.29	83.40
	DeepLabV3^［17］	84.45	94.68	82.12	92.13	65.74	6.36	89.57	87.13
Transformer-base	SETR^［19］	86.85	95.20	72.37	95.59	310.65	40.66	91.03	83.98
	Segmenter^［20］	90.64	97.31	81.56	93.20	102.39	13.42	93.98	87.38
	SegFormer^［21］	91.86	95.19	88.76	92.74	3.72	1.22	93.53	90.75
	FuseSwin	96.21	98.11	91.70	96.34	56.94	73.98	97.16	94.02

实验序号	AEM	多尺度特征融合	ASPP	mPA/%	mIoU/%
①	×	√	√	96.41	93.20
②	√	×	√	89.60	81.11
③	√	√	×	96.80	93.78
④	√	×	×	79.63	75.56
⑤	√	√	√	97.16	94.02

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

基于多尺度特征融合的遥感图像语义分割方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 24

Related Articles 15

Recommended Articles

Metrics

[1]	Wei LI, Ling CHEN, Xiuyuan XU, Min ZHU, Jixiang GUO, Kai ZHOU, Hao NIU, Yuchen ZHANG, Shanye YI, Yi ZHANG, Fengming LUO. Interstitial lung disease segmentation algorithm based on multi-task learning [J]. Journal of Computer Applications, 2024, 44(4): 1285-1293.
[2]	Boyue WANG, Yingxiang LI, Jiandan ZHONG. Segmentation network for day and night ground-based cloud images based on improved Res-UNet [J]. Journal of Computer Applications, 2024, 44(4): 1310-1316.
[3]	Pengfei ZHANG, Litao HAN, Hengjian FENG, Hongmei LI. Point cloud semantic segmentation based on attention mechanism and global feature optimization [J]. Journal of Computer Applications, 2024, 44(4): 1086-1092.
[4]	Yuliang ZHENG, Yunhua CHEN, Weijie BAI, Pinghua CHEN. Vehicle target detection by fusing event data and image frames [J]. Journal of Computer Applications, 2024, 44(3): 931-937.
[5]	Xinye LI, Yening HOU, Yinghui KONG, Zhiqi YAN. Few-shot object detection combining feature fusion and enhanced attention [J]. Journal of Computer Applications, 2024, 44(3): 745-751.
[6]	Xue LI, Guangle YAO, Honghui WANG, Jun LI, Haoran ZHOU, Shaoze YE. Remote sensing image classification based on sample incremental learning [J]. Journal of Computer Applications, 2024, 44(3): 732-736.
[7]	Zongze JIA, Pengfei GAO, Yinglong MA, Xiaofeng LIU, Haixin XIA. Multi-feature fusion attention-based hierarchical classification method for dialogue act [J]. Journal of Computer Applications, 2024, 44(3): 715-721.
[8]	Yuqiu LI, Liping HOU, Jian XUE, Ke LYU, Yong WANG. Remote sensing image recommendation method based on content interpretation [J]. Journal of Computer Applications, 2024, 44(3): 722-731.
[9]	Zhanjun JIANG, Baijing WU, Long MA, Jing LIAN. Faster-RCNN water-floating garbage recognition based on multi-scale feature and polarized self-attention [J]. Journal of Computer Applications, 2024, 44(3): 938-944.
[10]	Qiaoling HUANG, Bochuan ZHENG, Zicheng DING, Zedong WU. Improved image inpainting network incorporating supervised attention module and cross-stage feature fusion [J]. Journal of Computer Applications, 2024, 44(2): 572-579.
[11]	Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384.
[12]	Jia WANG-ZHU, Zhou YU, Jun YU, Jianping FAN. Video dynamic scene graph generation model based on multi-scale spatial-temporal Transformer [J]. Journal of Computer Applications, 2024, 44(1): 47-57.
[13]	Zhiping ZHU, Yan YANG, Jie WANG. Scene graph-aware cross-modal image captioning model [J]. Journal of Computer Applications, 2024, 44(1): 58-64.
[14]	Hong WANG, Qing QIAN, Huan WANG, Yong LONG. Lightweight image tamper localization algorithm based on large kernel attention convolution [J]. Journal of Computer Applications, 2023, 43(9): 2692-2699.
[15]	Kunting LU, Rongrong FEI, Xuande ZHANG. Remote sensing image pansharpening by convolutional neural network [J]. Journal of Computer Applications, 2023, 43(9): 2963-2969.