基于空洞卷积融合Transformer的无人机图像小目标检测方法

doi:10.11772/j.issn.1001-9081.2023111575

《计算机应用》唯一官方网站

• • 下一篇

基于空洞卷积融合Transformer的无人机图像小目标检测方法

王林^1,2,刘景亮²,王无为³

1.厦门工学院数据科学与计算机学院，福建厦门 361021； 2.西安理工大学自动化与信息工程学院，西安 710048； 3.西安邮电大学自动化学院，西安 710121

收稿日期:2023-11-15 修回日期:2023-12-22 接受日期:2023-12-22 发布日期:2024-01-04 出版日期:2024-01-04
通讯作者: 刘景亮
作者简介:王林(1963—)，男，江苏东台人，教授，博士，主要研究方向：计算机视觉；刘景亮(1997—)，男，河南开封人，硕士研究生，主要研究方向：计算机视觉；王无为(1991—)，男，江苏东台人，副教授，博士，主要研究方向：机器视觉、深度学习、战场环境感知、基于视觉信息的制导系统。
基金资助:
国家自然科学基金资助项目(62202376)；陕西省科协青年人才托举计划(20220129)；陕西省教育厅专项科研计划(22JK0565)；厦门工学院科学与技术研究院启动项目(KYYKT202301)。

Small target detection method in UAV images based on dilated convolution fusion Transformer

WANG Lin^1,2, LIU Jingliang^2*, WANG Wuwei³ #br#

Received:2023-11-15 Revised:2023-12-22 Accepted:2023-12-22 Online:2024-01-04 Published:2024-01-04
About author:WANG Lin, born in 1963, Ph. D., professor. His research interests include computer vision. LIU Jingliang, born in 1997, M. S. candidate. His research interests include computer vision. WANG Wuwei, born in 1991, Ph. D., associate professor. His research interests include computer vision, deep learning, battlefield environment perception, guidance system based on visual information.
Supported by:
This work is partially supported by the National Natural Science Foundation of China (62202376), the Shaanxi Provincial Association for Science and Technology Youth Talent Support Program (20220129), the Special Scientific Research Plan of Shaanxi Provincial Department of Education (22JK0565), the Xiamen Institute of Technology Institute of Science and Technology Launches Project (KYYKT202301).

摘要/Abstract

摘要： 针对无人机航拍图像中目标场景复杂、尺度多样、小目标密集和遮挡严重的问题，提出一种多尺度空洞卷积的Swin-Det无人机图像目标检测算法。首先，采用Swin Transformer作为主干特征提取网络，并在主干网络中引入空间信息交融模块，解决因物体间的遮挡而导致目标信息模糊的问题；其次，提出一种融合空洞特征金字塔网络，通过多分支的空洞卷积将特征信息进行融合，有效提高网络的感受野和特征信息的复用，使模型可以学习到不同维度的细节特征；最后，采用线性插值法和多任务损失函数，解决预测区域不匹配和样本不平衡的问题，提升模型的检测精度。在VisDrone数据集上的实验结果表明，所提算法的平均精度均值达到了27.2％，与原始Swin Transformer相比，平均精度均值提高了4.1个百分点，且在同一训练批次下收敛更快。所提算法在复杂场景下能对无人机图像目标实现高精度检测。

关键词: 小目标检测, 特征融合, 空洞卷积, 无人机图像, Swin Transformer

Abstract: Aiming at the problems of complex target scenes, diverse scales, dense small targets and severe occlusion in UAV aerial images, a multi-scale dilation convolution Swin-Det UAV image target detection algorithm was proposed. Firstly, Swin Transformer is used as the backbone feature extraction network, and a spatial information blending module is introduced into the backbone network to solve the problem of blurred target information due to occlusion between objects; secondly, a fusion of dilation feature pyramid network is proposed, which uses multiple branches to the dilation convolution fuses the feature information, effectively improves the receptive field of the network and the reuse of feature information, so that the model can learn detailed features of different dimensions; finally, linear interpolation method and multi-task loss function are used to solve the problem of inconsistency in the prediction area. Matching and sample imbalance problems improve the detection accuracy of the model. Experimental results on the VisDrone data set show that the mean average precision of the proposed algorithm reaches 27.2%. Compared with the original Swin Transformer, the mean average precision is improved by 4.1 percentage points and converges faster under the same training batch. The proposed algorithm can achieve high-precision detection of drone image targets in complex scenarios.

Key words: small-target detection, feature fusion, dilation convolution, UAV images, Swin Transformer

中图分类号:

TP391.4

王林刘景亮王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2023111575.

WANG Lin, LIU Jingliang, WANG Wuwei. Small target detection method in UAV images based on dilated convolution fusion Transformer[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2023111575.

[1]	贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 715-721.
[2]	蒋占军, 吴佰靖, 马龙, 廉敬. 多尺度特征和极化自注意力的Faster-RCNN水漂垃圾识别[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 938-944.
[3]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.
[4]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[5]	李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751.
[6]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[7]	黄巧玲, 郑伯川, 丁梓成, 吴泽东. 融合监督注意力模块和跨阶段特征融合的图像修复改进网络[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 572-579.
[8]	朱志平, 杨燕, 王杰. 基于场景图感知的跨模态图像描述模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 58-64.
[9]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.
[10]	李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587.
[11]	徐则林, 杨敏, 陈勐. 融合空间和文本信息的兴趣点类别表征模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2456-2461.
[12]	刘欢, 吴亮红, 张侣, 陈亮, 周博文, 张红强. 基于特征双融合CenterNet的白细胞检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2602-2610.
[13]	梁美佳, 刘昕武, 胡晓鹏. 基于改进YOLOv3的列车运行环境图像小目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2611-2618.
[14]	郑帅, 张晓龙, 邓鹤, 任宏伟. 基于多尺度特征融合和网格注意力机制的三维肝脏影像分割方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2303-2310.
[15]	吕宗喆, 徐慧, 杨骁, 王勇, 王唯鉴. 面向小目标的YOLOv5安全帽检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1943-1949.

基于空洞卷积融合Transformer的无人机图像小目标检测方法

Small target detection method in UAV images based on dilated convolution fusion Transformer

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics