Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (3): 737-744.DOI: 10.11772/j.issn.1001-9081.2023040439

• Artificial intelligence • Previous Articles     Next Articles

Semantic segmentation method for remote sensing images based on multi-scale feature fusion

Ning WU1,2, Yangyang LUO1, Huajie XU1,3()   

  1. 1.School of Computer,Electronics and Information,Guangxi University,Nanning Guangxi 530004,China
    2.Guangxi Key Laboratory of Marine Engineering Equipment and Technology (Beibu Gulf University),Qinzhou Guangxi 535011,China
    3.Guangxi Key Laboratory of Multimedia Communications and Network Technology (Guangxi University),Nanning Guangxi 530004,China
  • Received:2023-04-18 Revised:2023-06-26 Accepted:2023-06-30 Online:2023-12-04 Published:2024-03-10
  • Contact: Huajie XU
  • About author:WU Ning, born in 1980, Ph. D., research fellow. His research interests include image processing, pattern recognition, machine vision.
    LUO Yangyang, born in 1998, M. S. candidate. Her research interests include semantic segmentation,deep learning.
  • Supported by:
    Science and Technology Plan Project of Chongzuo(FB2018001)

基于多尺度特征融合的遥感图像语义分割方法

吴宁1,2, 罗杨洋1, 许华杰1,3()   

  1. 1.广西大学 计算机与电子信息学院, 南宁 530004
    2.广西海洋工程装备与技术重点实验室(北部湾大学), 广西 钦州 535011
    3.广西多媒体通信与网络技术重点实验室(广西大学), 南宁 530004
  • 通讯作者: 许华杰
  • 作者简介:吴宁(1980—),男,广西贵港人,研究员,博士,主要研究方向:图像处理、模式识别、机器视觉
    罗杨洋(1998—),女(壮族),广西田阳人,硕士研究生,主要研究方向:语义分割、深度学习;
  • 基金资助:
    崇左市科技计划项目(FB2018001)

Abstract:

To improve the accuracy of semantic segmentation for remote sensing images and address the loss problem of small-sized target information during feature extraction by Deep Convolutional Neural Network (DCNN), a semantic segmentation method based on multi-scale feature fusion named FuseSwin was proposed. Firstly, an Attention Enhancement Module (AEM) was introduced in the Swin Transformer to highlight the target area and suppress background noise. Secondly, the Feature Pyramid Network (FPN) was used to fuse the detailed information and high-level semantic information of the multi-scale features to complement the features of the target. Finally, the Atrous Spatial Pyramid Pooling (ASPP) module was used to capture the contextual information of the target from the fused feature map and further improve the model segmentation accuracy. Experimental results demonstrate that the proposed method outperforms current mainstream segmentation methods.The mean Pixel Accuracy (mPA) and mean Intersection over Union (mIoU) of the proposed method on Potsdam remote sensing dataset are 2.34 and 3.23 percentage points higher than those of DeepLabV3 method, and 1.28 and 1.75 percentage points higher than those of SegFormer method. Additionally, the proposed method was applied to identify and segment oyster rafts in high-resolution remote sensing images of the Maowei Sea in Qinzhou, Guangxi, and achieved Pixel Accuracy (PA) and Intersection over Union (IoU) of 96.21% and 91.70%, respectively.

Key words: remote sensing image, semantic segmentation, multi-scale, feature fusion, Swin Transformer

摘要:

为提高遥感图像语义分割精度,解决深度卷积神经网络(DCNN)特征提取过程中小尺寸目标信息丢失的问题,提出一种基于多尺度特征融合的语义分割方法FuseSwin。首先,在Swin Transformer中引入注意力增强模块(AEM),以突出目标所在区域并抑制背景噪声的干扰;其次,利用特征金字塔网络(FPN)融合多尺度特征的细节信息和高级语义信息,以补充目标的特征;最后,通过空洞空间金字塔池化(ASPP)模块从融合特征图中进一步捕获目标的上下文信息,提升模型分割精度。实验结果表明,所提方法在Potsdam遥感数据集上的平均像素准确率(mPA)和平均交并比(mIoU),与DeepLabV3方法相比,分别提高了2.34、3.23个百分点;与SegFormer方法相比,分别提高了1.28、1.75个百分点,优于目前主流的分割方法。此外,将所提方法实际应用于广西钦州茅尾海的高分辨率遥感图像中的蚝排识别与分割,分别取得96.21%、91.70%的像素准确率(PA)和交并比(IoU)。

关键词: 遥感图像, 语义分割, 多尺度, 特征融合, Swin Transformer

CLC Number: