Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (11): 3595-3602.DOI: 10.11772/j.issn.1001-9081.2023111575

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Small target detection method in UAV images based on fusion of dilated convolution and Transformer

Lin WANG1,2, Jingliang LIU2(), Wuwei WANG3   

  1. 1.School of Data and Computer Science,Xiamen Institute of Technology,Xiamen Fujian 361021,China
    2.School of Automation and Information Engineering,Xi'an University of Technology,Xi'an Shaanxi 710048,China
    3.School of Automation,Xi'an University of Posts and Telecommunications,Xi'an Shaanxi 710121,China
  • Received:2023-11-16 Revised:2023-12-22 Accepted:2023-12-22 Online:2024-01-04 Published:2024-11-10
  • Contact: Jingliang LIU
  • About author:WANG Lin, born in 1963, Ph. D., professor. His research interests include computer vision.
    WANG Wuwei, born in 1991, Ph. D., associate professor. His research interests include machine vision, deep learning, battlefield environment perception, guidance system based on visual information.
  • Supported by:
    National Natural Science Foundation of China(62202376);Shaanxi Association for Science and Technology Youth Talent Support Program(20220129);Special Scientific Research Program of Shaanxi Provincial Department of Education(22JK0565);Science and Technology Starting Project of Xiamen Institute of Technology(KYYKT202301)

基于空洞卷积融合Transformer的无人机图像小目标检测方法

王林1,2, 刘景亮2(), 王无为3   

  1. 1.厦门工学院 数据科学与计算机学院,福建 厦门 361021
    2.西安理工大学 自动化与信息工程学院,西安 710048
    3.西安邮电大学 自动化学院,西安 710121
  • 通讯作者: 刘景亮
  • 作者简介:王林(1963—),男,江苏东台人,教授,博士,主要研究方向:计算机视觉
    王无为(1991—),男,江苏东台人,副教授,博士,主要研究方向:机器视觉、深度学习、战场环境感知、基于视觉信息的制导系统。
  • 基金资助:
    国家自然科学基金资助项目(62202376);陕西省科协青年人才托举计划项目(20220129);陕西省教育厅专项科研计划项目(22JK0565);厦门工学院科学与技术研究院启动项目(KYYKT202301)

Abstract:

A multi-scale dilated convolution based Unmanned Aerial Vehicle (UAV) image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes, diverse scales of targets, dense small targets and severe occlusion of targets in UAV aerial images. Firstly, Swin Transformer was used as the backbone feature extraction network, and a Spatial Information Blending Module (SIBM) was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly, a Fusion of Dilation Feature Pyramid Network (FDFPN) was proposed to fuse feature information through multi-branch dilated convolution, thereby effectively improving the receptive field of the network and the reuse of feature information, so that the model was able to learn detailed features of different dimensions. Finally, the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function, thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision (mAP) of 27.2%, which is 4.1 percentage points higher than that of the original Swin Transformer, and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.

Key words: small target detection, feature fusion, dilated convolution, Unmanned Aerial Vehicle (UAV) image, Swin Transformer

摘要:

针对无人机(UAV)航拍图像中目标场景复杂、目标尺度多样、小目标密集和目标遮挡严重的问题,提出一种多尺度空洞卷积的UAV图像目标检测算法Swin-Det。首先,采用Swin Transformer作为主干特征提取网络,并在主干网络中引入空间信息交融模块(SIBM),从而解决因物体间遮挡而导致的目标信息模糊的问题;其次,提出一种融合空洞特征金字塔网络(FDFPN),通过多分支的空洞卷积融合特征信息,以有效提高网络的感受野以及特征信息的复用,使模型可以学习到不同维度的细节特征;最后,采用线性插值法和多任务损失函数解决预测区域不匹配和样本不平衡的问题,提升模型的检测精度。在VisDrone数据集上的实验结果表明,Swin-Det算法的平均精度均值(mAP)达到了27.2%,与原始Swin Transformer相比,提高了4.1个百分点,且在同一训练批次下收敛更快。可见,Swin-Det算法可在复杂场景下实现对无人机图像目标的高精度检测。

关键词: 小目标检测, 特征融合, 空洞卷积, 无人机图像, Swin Transformer

CLC Number: