《计算机应用》唯一官方网站

• •    下一篇

基于空洞卷积融合Transformer的无人机图像小目标检测方法

王林1,2,刘景亮2,王无为3   

  1. 1.厦门工学院 数据科学与计算机学院,福建 厦门 3610212.西安理工大学 自动化与信息工程学院,西安 7100483.西安邮电大学 自动化学院,西安 710121

  • 收稿日期:2023-11-15 修回日期:2023-12-22 接受日期:2023-12-22 发布日期:2024-01-04 出版日期:2024-01-04
  • 通讯作者: 刘景亮
  • 作者简介:王林(1963—),男,江苏东台人,教授,博士,主要研究方向:计算机视觉;刘景亮(1997—),男,河南开封人, 硕士研究生,主要研究方向:计算机视觉;王无为(1991—),男,江苏东台人,副教授,博士,主要研究方向:机器视觉、 深度学习、战场环境感知、基于视觉信息的制导系统。
  • 基金资助:
    国家自然科学基金资助项目(62202376);陕西省科协青年人才托举计划(20220129);陕西省教育厅专项科研计划(22JK0565);厦门工学院科学与技术研究院启动项目(KYYKT202301)

Small target detection method in UAV images based on dilated convolution fusion Transformer

WANG Lin1,2, LIU Jingliang2*, WANG Wuwei3 #br#   

  • Received:2023-11-15 Revised:2023-12-22 Accepted:2023-12-22 Online:2024-01-04 Published:2024-01-04
  • About author:WANG Lin, born in 1963, Ph. D., professor. His research interests include computer vision. LIU Jingliang, born in 1997, M. S. candidate. His research interests include computer vision. WANG Wuwei, born in 1991, Ph. D., associate professor. His research interests include computer vision, deep learning, battlefield environment perception, guidance system based on visual information.
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (62202376), the Shaanxi Provincial Association for Science and Technology Youth Talent Support Program (20220129), the Special Scientific Research Plan of Shaanxi Provincial Department of Education (22JK0565), the Xiamen Institute of Technology Institute of Science and Technology Launches Project (KYYKT202301).

摘要: 针对无人机航拍图像中目标场景复杂、尺度多样、小目标密集和遮挡严重的问题,提出一种多尺度空洞卷积的Swin-Det无人机图像目标检测算法。首先,采用Swin Transformer作为主干特征提取网络,并在主干网络中引入空间信息交融模块,解决因物体间的遮挡而导致目标信息模糊的问题;其次,提出一种融合空洞特征金字塔网络,通过多分支的空洞卷积将特征信息进行融合,有效提高网络的感受野和特征信息的复用,使模型可以学习到不同维度的细节特征;最后,采用线性插值法和多任务损失函数,解决预测区域不匹配和样本不平衡的问题,提升模型的检测精度。在VisDrone数据集上的实验结果表明,所提算法的平均精度均值达到了27.2%,与原始Swin Transformer相比,平均精度均值提高了4.1个百分点,且在同一训练批次下收敛更快。所提算法在复杂场景下能对无人机图像目标实现高精度检测。

关键词: 小目标检测, 特征融合, 空洞卷积, 无人机图像, Swin Transformer

Abstract: Aiming at the problems of complex target scenes, diverse scales, dense small targets and severe occlusion in UAV aerial images, a multi-scale dilation convolution Swin-Det UAV image target detection algorithm was proposed. Firstly, Swin Transformer is used as the backbone feature extraction network, and a spatial information blending module is introduced into the backbone network to solve the problem of blurred target information due to occlusion between objects; secondly, a fusion of dilation feature pyramid network is proposed, which uses multiple branches to the dilation convolution fuses the feature information, effectively improves the receptive field of the network and the reuse of feature information, so that the model can learn detailed features of different dimensions; finally, linear interpolation method and multi-task loss function are used to solve the problem of inconsistency in the prediction area. Matching and sample imbalance problems improve the detection accuracy of the model. Experimental results on the VisDrone data set show that the mean average precision of the proposed algorithm reaches 27.2%. Compared with the original Swin Transformer, the mean average precision is improved by 4.1 percentage points and converges faster under the same training batch. The proposed algorithm can achieve high-precision detection of drone image targets in complex scenarios.

Key words: small-target detection, feature fusion, dilation convolution, UAV images, Swin Transformer

中图分类号: