基于多尺度特征融合的遥感图像语义分割方法

doi:10.11772/j.issn.1001-9081.2023040439

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (3): 737-744.DOI: 10.11772/j.issn.1001-9081.2023040439

所属专题：人工智能

基于多尺度特征融合的遥感图像语义分割方法

吴宁¹^,², 罗杨洋¹, 许华杰¹^,³()

^1.广西大学计算机与电子信息学院, 南宁 530004
^2.广西海洋工程装备与技术重点实验室(北部湾大学), 广西钦州 535011
^3.广西多媒体通信与网络技术重点实验室(广西大学), 南宁 530004

收稿日期:2023-04-18 修回日期:2023-06-26 接受日期:2023-06-30 发布日期:2023-12-04 出版日期:2024-03-10
通讯作者: 许华杰
作者简介:吴宁（1980—），男，广西贵港人，研究员，博士，主要研究方向：图像处理、模式识别、机器视觉
罗杨洋（1998—），女（壮族），广西田阳人，硕士研究生，主要研究方向：语义分割、深度学习；
基金资助:
崇左市科技计划项目(FB2018001)

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

Ning WU¹^,², Yangyang LUO¹, Huajie XU¹^,³()

^1.School of Computer，Electronics and Information，Guangxi University，Nanning Guangxi 530004，China
^2.Guangxi Key Laboratory of Marine Engineering Equipment and Technology （Beibu Gulf University），Qinzhou Guangxi 535011，China
^3.Guangxi Key Laboratory of Multimedia Communications and Network Technology （Guangxi University），Nanning Guangxi 530004，China

Received:2023-04-18 Revised:2023-06-26 Accepted:2023-06-30 Online:2023-12-04 Published:2024-03-10
Contact: Huajie XU
About author:WU Ning， born in 1980， Ph. D.， research fellow. His research interests include image processing， pattern recognition， machine vision.
LUO Yangyang， born in 1998， M. S. candidate. Her research interests include semantic segmentation，deep learning.
Supported by:
Science and Technology Plan Project of Chongzuo(FB2018001)

摘要/Abstract

摘要：

为提高遥感图像语义分割精度，解决深度卷积神经网络（DCNN）特征提取过程中小尺寸目标信息丢失的问题，提出一种基于多尺度特征融合的语义分割方法FuseSwin。首先，在Swin Transformer中引入注意力增强模块（AEM），以突出目标所在区域并抑制背景噪声的干扰；其次，利用特征金字塔网络（FPN）融合多尺度特征的细节信息和高级语义信息，以补充目标的特征；最后，通过空洞空间金字塔池化（ASPP）模块从融合特征图中进一步捕获目标的上下文信息，提升模型分割精度。实验结果表明，所提方法在Potsdam遥感数据集上的平均像素准确率（mPA）和平均交并比（mIoU），与DeepLabV3方法相比，分别提高了2.34、3.23个百分点；与SegFormer方法相比，分别提高了1.28、1.75个百分点，优于目前主流的分割方法。此外，将所提方法实际应用于广西钦州茅尾海的高分辨率遥感图像中的蚝排识别与分割，分别取得96.21%、91.70%的像素准确率（PA）和交并比（IoU）。

关键词: 遥感图像, 语义分割, 多尺度, 特征融合, Swin Transformer

Abstract:

To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network （DCNN）， a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly， an Attention Enhancement Module （AEM） was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly， the Feature Pyramid Network （FPN） was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally， the Atrous Spatial Pyramid Pooling （ASPP） module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy （mPA） and mean Intersection over Union （mIoU） of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method， and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally， the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou， Guangxi， and achieved Pixel Accuracy （PA） and Intersection over Union （IoU） of 96.21% and 91.70%， respectively.

Key words: remote sensing image, semantic segmentation, multi-scale, feature fusion, Swin Transformer

中图分类号:

TP751.1

吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 计算机应用, 2024, 44(3): 737-744.

Ning WU, Yangyang LUO, Huajie XU. Semantic segmentation method for remote sensing images based on multi-scale feature fusion[J]. Journal of Computer Applications, 2024, 44(3): 737-744.

图/表 11

参考文献 24

1	KOTARIDIS I， LAZARIDOU M. Remote sensing image segmentation advances： a meta-analysis ［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2021， 173： 309-322. 10.1016/j.isprsjprs.2021.01.020
2	DOSOViTSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2023-05-22］. .
3	ALEISSAEE A A， KUMAR A， ANWER R M， et al. Transformers in remote sensing： a survey ［EB/OL］. ［2023-02-11］. . 10.3390/rs15071860
4	NASEER M， RANASINGHE K， KHAN S H， et al. Intriguing properties of vision Transformers ［J］. Advances in Neural Information Processing Systems， 2021， 34： 23296-23308.
5	傅励瑶，尹梦晓，杨锋.基于Transformer的U型医学图像分割网络综述［J］.计算机应用，2023，43（5）：1584-1595.
	FU L Y， YIN M X， YANG F. Transformer based U-shaped medical image segmentation network： a survey ［J］. Journal of Computer Applications， 2023， 43（5）： 1584-1595.
6	王利，宣士斌，秦续阳，等.基于双解码器的Transformer多目标跟踪方法［J］.计算机应用，2023， 43（6）： 1919-1929.
	WANG L， XUAN S B， QIN X Y， et al. Multi-object tracking method based on dual-decoder Transformer ［J］. Journal of Computer Applications， 2023， 43（6）： 1919-1929.
7	XU Z， ZHANG W， ZHANG T， et al. Efficient Transformer for remote sensing image segmentation ［J］. Remote Sensing， 2021， 13（18）： 3585. 10.3390/rs13183585
8	YUAN X， SHI J， GU L. A review of deep learning methods for semantic segmentation of remote sensing imagery ［J］. Expert Systems with Applications， 2021， 169： 114417. 10.1016/j.eswa.2020.114417
9	ZHAO T， XU J， CHEN R， et al. Remote sensing image segmentation based on the fuzzy deep convolutional neural network［J］. International Journal of Remote Sensing， 2021， 42（16）： 6264-6283. 10.1080/01431161.2021.1938738
10	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
11	KIRILLOV A， GIRSHICK R， HE K， et al. Panoptic feature pyramid networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 6392-6401. 10.1109/cvpr.2019.00656
12	LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3431-3440. 10.1109/cvpr.2015.7298965
13	RONNEBERGER O， FISCHER P， BROX T. U-net： convolutional networks for biomedical image segmentation ［C］// Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention， LNCS 9351. Cham： Springer， 2015： 234-241.
14	ZHAO H， SHI J， QI X， et al. Pyramid scene parsing network ［C］//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6230-6239. 10.1109/cvpr.2017.660
15	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2023-01-10］. . 10.1109/tpami.2017.2699184
16	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848. 10.1109/tpami.2017.2699184
17	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］.［2023-01-10］. . 10.1007/978-3-030-01234-2_49
18	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 833-851. 10.1007/978-3-030-01234-2_49
19	ZHENG S， LU J， ZHAO H， et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 6877-6886. 10.1109/cvpr46437.2021.00681
20	STRUDEL R， GARCIA R， LAPTEV I， et al. Segmenter： Transformer for semantic segmentation ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 7242-7252. 10.1109/iccv48922.2021.00717
21	XIE E， WANG W， YU Z， et al. SegFormer： simple and efficient design for semantic segmentation with Transformers ［J］. Advances in Neural Information Processing Systems， 2021， 34： 12077-12090.
22	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： hierarchical vision Transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
23	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
24	International Society for Photogrammetry and Remote Sensing. 2D semantic labeling contest — Potsdam ［DB/OL］. ［2023-06-21］..

方法类别	方法名称	不同类别的IoU/%					参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	不透水表面	建筑物	低植被	树木	汽车	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	PSPNet^［14］	85.56	94.06	77.88	78.31	76.63	46.60	5.32	89.92	82.49
	FCN^［12］	85.33	94.23	77.42	77.31	78.22	47.13	5.49	89.96	82.51
	DeepLabV3^［17］	85.56	94.04	77.79	78.35	78.46	65.74	6.36	90.69	82.84
Transformer-base	SETR^［19］	82.03	93.98	76.72	77.62	77.43	310.65	40.66	88.43	81.56
	Segmenter^［20］	83.19	93.92	77.80	78.76	78.92	102.39	13.42	90.10	82.52
	SegFormer^［21］	85.61	92.09	78.07	76.80	89.01	3.72	1.22	91.75	84.32
	FuseSwin	87.00	94.00	79.56	78.86	90.93	56.94	73.98	93.03	86.07

方法类别	方法名称	不同类别的IoU/%					参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	不透水表面	建筑物	低植被	树木	汽车	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	PSPNet^［14］	85.56	94.06	77.88	78.31	76.63	46.60	5.32	89.92	82.49
	FCN^［12］	85.33	94.23	77.42	77.31	78.22	47.13	5.49	89.96	82.51
	DeepLabV3^［17］	85.56	94.04	77.79	78.35	78.46	65.74	6.36	90.69	82.84
Transformer-base	SETR^［19］	82.03	93.98	76.72	77.62	77.43	310.65	40.66	88.43	81.56
	Segmenter^［20］	83.19	93.92	77.80	78.76	78.92	102.39	13.42	90.10	82.52
	SegFormer^［21］	85.61	92.09	78.07	76.80	89.01	3.72	1.22	91.75	84.32
	FuseSwin	87.00	94.00	79.56	78.86	90.93	56.94	73.98	93.03	86.07

方法类别	方法名称	PA/%		IoU/%		参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	蚝排	陆地	蚝排	陆地	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	FCN^［12］	84.32	98.23	70.56	95.84	47.13	5.49	91.28	83.20
	PSPNet^［14］	82.63	97.94	69.85	96.94	46.60	5.32	90.29	83.40
	DeepLabV3^［17］	84.45	94.68	82.12	92.13	65.74	6.36	89.57	87.13
Transformer-base	SETR^［19］	86.85	95.20	72.37	95.59	310.65	40.66	91.03	83.98
	Segmenter^［20］	90.64	97.31	81.56	93.20	102.39	13.42	93.98	87.38
	SegFormer^［21］	91.86	95.19	88.76	92.74	3.72	1.22	93.53	90.75
	FuseSwin	96.21	98.11	91.70	96.34	56.94	73.98	97.16	94.02

方法类别	方法名称	PA/%		IoU/%		参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
方法类别	方法名称	蚝排	陆地	蚝排	陆地	参数量/MB	计算量/GFLOPs	mPA/%	mIoU/%
CNN-base	FCN^［12］	84.32	98.23	70.56	95.84	47.13	5.49	91.28	83.20
	PSPNet^［14］	82.63	97.94	69.85	96.94	46.60	5.32	90.29	83.40
	DeepLabV3^［17］	84.45	94.68	82.12	92.13	65.74	6.36	89.57	87.13
Transformer-base	SETR^［19］	86.85	95.20	72.37	95.59	310.65	40.66	91.03	83.98
	Segmenter^［20］	90.64	97.31	81.56	93.20	102.39	13.42	93.98	87.38
	SegFormer^［21］	91.86	95.19	88.76	92.74	3.72	1.22	93.53	90.75
	FuseSwin	96.21	98.11	91.70	96.34	56.94	73.98	97.16	94.02

实验序号	AEM	多尺度特征融合	ASPP	mPA/%	mIoU/%
①	×	√	√	96.41	93.20
②	√	×	√	89.60	81.11
③	√	√	×	96.80	93.78
④	√	×	×	79.63	75.56
⑤	√	√	√	97.16	94.02

基于多尺度特征融合的遥感图像语义分割方法

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 24

相关文章 15

编辑推荐

Metrics

[1]	戎妍, 刘嘉雯, 李馨蕾. 面向学生课堂情感计算的自适应混合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2919-2930.
[2]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[3]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[4]	李晨倩, 刘俊. 基于半监督和多尺度级联注意力的超声颈动脉斑块分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2604-2610.
[5]	李伟, 张晓蓉, 陈鹏, 李清, 张长青. 基于正态逆伽马分布的多尺度融合人群计数算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2243-2249.
[6]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[7]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[8]	唐媛, 陈艳平, 扈应, 黄瑞章, 秦永彬. 基于多尺度混合注意力卷积神经网络的关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2011-2017.
[9]	施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.
[10]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[11]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[12]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[13]	王美, 苏雪松, 刘佳, 殷若南, 黄珊. 时频域多尺度交叉注意力融合的时间序列分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1842-1847.
[14]	程小辉, 黄云天, 张瑞芳. 基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1927-1934.
[15]	李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.