基于空洞卷积融合Transformer的无人机图像小目标检测方法

doi:10.11772/j.issn.1001-9081.2023111575

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3595-3602.DOI: 10.11772/j.issn.1001-9081.2023111575

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于空洞卷积融合Transformer的无人机图像小目标检测方法

王林¹^,², 刘景亮²(), 王无为³

^1.厦门工学院数据科学与计算机学院，福建厦门 361021
^2.西安理工大学自动化与信息工程学院，西安 710048
^3.西安邮电大学自动化学院，西安 710121

收稿日期:2023-11-16 修回日期:2023-12-22 接受日期:2023-12-22 发布日期:2024-01-04 出版日期:2024-11-10
通讯作者: 刘景亮
作者简介:王林（1963—），男，江苏东台人，教授，博士，主要研究方向：计算机视觉
王无为（1991—），男，江苏东台人，副教授，博士，主要研究方向：机器视觉、深度学习、战场环境感知、基于视觉信息的制导系统。
基金资助:
国家自然科学基金资助项目(62202376);陕西省科协青年人才托举计划项目(20220129);陕西省教育厅专项科研计划项目(22JK0565);厦门工学院科学与技术研究院启动项目(KYYKT202301)

Small target detection method in UAV images based on fusion of dilated convolution and Transformer

Lin WANG¹^,², Jingliang LIU²(), Wuwei WANG³

^1.School of Data and Computer Science，Xiamen Institute of Technology，Xiamen Fujian 361021，China
^2.School of Automation and Information Engineering，Xi'an University of Technology，Xi'an Shaanxi 710048，China
^3.School of Automation，Xi'an University of Posts and Telecommunications，Xi'an Shaanxi 710121，China

Received:2023-11-16 Revised:2023-12-22 Accepted:2023-12-22 Online:2024-01-04 Published:2024-11-10
Contact: Jingliang LIU
About author:WANG Lin， born in 1963， Ph. D.， professor. His research interests include computer vision.
WANG Wuwei， born in 1991， Ph. D.， associate professor. His research interests include machine vision， deep learning， battlefield environment perception， guidance system based on visual information.
Supported by:
National Natural Science Foundation of China(62202376);Shaanxi Association for Science and Technology Youth Talent Support Program(20220129);Special Scientific Research Program of Shaanxi Provincial Department of Education(22JK0565);Science and Technology Starting Project of Xiamen Institute of Technology(KYYKT202301)

摘要/Abstract

摘要：

针对无人机（UAV）航拍图像中目标场景复杂、目标尺度多样、小目标密集和目标遮挡严重的问题，提出一种多尺度空洞卷积的UAV图像目标检测算法Swin-Det。首先，采用Swin Transformer作为主干特征提取网络，并在主干网络中引入空间信息交融模块（SIBM），从而解决因物体间遮挡而导致的目标信息模糊的问题；其次，提出一种融合空洞特征金字塔网络（FDFPN），通过多分支的空洞卷积融合特征信息，以有效提高网络的感受野以及特征信息的复用，使模型可以学习到不同维度的细节特征；最后，采用线性插值法和多任务损失函数解决预测区域不匹配和样本不平衡的问题，提升模型的检测精度。在VisDrone数据集上的实验结果表明，Swin-Det算法的平均精度均值（mAP）达到了27.2％，与原始Swin Transformer相比，提高了4.1个百分点，且在同一训练批次下收敛更快。可见，Swin-Det算法可在复杂场景下实现对无人机图像目标的高精度检测。

关键词: 小目标检测, 特征融合, 空洞卷积, 无人机图像, Swin Transformer

Abstract:

A multi-scale dilated convolution based Unmanned Aerial Vehicle （UAV） image target detection algorithm Swin-Det was proposed to address the issues of complex target scenes， diverse scales of targets， dense small targets and severe occlusion of targets in UAV aerial images. Firstly， Swin Transformer was used as the backbone feature extraction network， and a Spatial Information Blending Module （SIBM） was introduced into the backbone network to solve the problem of fuzziness in target information due to occlusion between objects. Secondly， a Fusion of Dilation Feature Pyramid Network （FDFPN） was proposed to fuse feature information through multi-branch dilated convolution， thereby effectively improving the receptive field of the network and the reuse of feature information， so that the model was able to learn detailed features of different dimensions. Finally， the issues of mismatches in the prediction area and sample imbalance were addressed by using linear interpolation method and multi-task loss function， thereby improving the detection precision of the model. Experimental results on VisDrone dataset show that the Swin-Det algorithm reaches a mean Average Precision （mAP） of 27.2%， which is 4.1 percentage points higher than that of the original Swin Transformer， and converges faster under the same training batch. It can be seen tha the Swin-Det algorithm can achieve high-precision detection of UAV image targets in complex scenes.

Key words: small target detection, feature fusion, dilated convolution, Unmanned Aerial Vehicle (UAV) image, Swin Transformer

中图分类号:

TP391.4

王林, 刘景亮, 王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法[J]. 计算机应用, 2024, 44(11): 3595-3602.

Lin WANG, Jingliang LIU, Wuwei WANG. Small target detection method in UAV images based on fusion of dilated convolution and Transformer[J]. Journal of Computer Applications, 2024, 44(11): 3595-3602.

图/表 14

参考文献 25

1	LIANG J， CHEN X， LIANG C， et al. A detection approach for late-autumn shoots of litchi based on Unmanned Aerial Vehicle （UAV） remote sensing［J］. Computers and Electronics in Agriculture， 2023， 204： No.107535.
2	SILVA L A， LEITHARDT V R Q， BATISTA V F L， et al. Automated road damage detection using UAV images and deep learning techniques［J］. IEEE Access， 2023， 11： 62918-62931.
3	XUE Y， JIN G， SHEN T， et al. Template-guided frequency attention and adaptive cross-entropy loss for UAV visual tracking［J］. Chinese Journal of Aeronautics， 2023， 36（9）： 299-312.
4	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
5	刘英杰，杨风暴，胡鹏. 基于Cascade R-CNN的并行特征金字塔网络无人机航拍图像目标检测算法［J］. 激光与光电子学进展， 2020， 57（20）： No.201505.
	LIU Y J， YANG F B， HU P. Parallel FPN algorithm based on Cascade R-CNN for object detection form UAV aerial images［J］. Laser and Optoelectronics Progress， 2020， 57（20）： No.201505.
6	HONG S， KANG S， CHO D. Patch-level augmentation for object detection in aerial images［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway： IEEE， 2019： 127-134.
7	LIANG B， SU J， FENG K， et al. Cross-layer triple-branch parallel fusion network for small object detection in UAV images［J］. IEEE Access， 2023， 11： 39738-39750.
8	QIAO S， CHEN L C， YUILLE A. DetectoRS： detecting objects with recursive feature pyramid and switchable atrous convolution［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10208-10219.
9	BEERY S， WU G， RATHOD V， et al. Context R-CNN： long term temporal context for per-camera object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13072-13082.
10	田婷婷，杨军. 基于多尺度特征融合网络的遥感影像目标检测［J］. 激光与光电子学进展， 2022， 59（16）： No.1628003.
	TIAN T T， YANG J. Object detection for remote sensing image based on multiscale feature fusion network［J］. Laser and Optoelectronics Progress， 2022， 59（16）： No.1628003.
11	YANG X， YANG J， YAN J， et al. SCRDet： towards more robust detection for small， cluttered and rotated objects［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8231-8240.
12	田永林，王雨桐，王建功，等. 视觉Transformer研究的关键问题：现状及展望［J］. 自动化学报， 2022， 48（4）：957-979.
	TIAN Y L， WANG Y T， WANG J G， et al. Key problems and progress of vision Transformers： the state of the art and prospects［J］. Acta Automatica Sinica， 2022， 48（4）： 957-979.
13	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-10-22］..
14	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9992-10002.
15	XU Y， YANG Y， ZHANG L. DeMT： deformable mixer Transformer for multi-task learning of dense prediction［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2023： 3072-3080.
16	HE X， ZHOU Y， ZHAO J， et al. Swin Transformer embedding UNet for remote sensing image semantic segmentation［J］. IEEE Transactions on Geoscience and Remote Sensing， 2022， 60： No.4408715.
17	JIANG X， WU Y. Remote sensing object detection based on convolution and Swin Transformer［J］. IEEE Access， 2023， 11： 38643-38656.
18	付苗苗，邓淼磊，张德贤. 基于深度学习和Transformer的目标检测算法［J］. 计算机工程与应用， 2023， 59（1）：37-48.
	FU M M， DENG M L， ZHANG D X. Object detection algorithms based on deep learning and Transformer［J］. Computer Engineering and Applications， 2023， 59（1）： 37-48.
19	WANG P， CHEN P， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460.
20	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
21	RUKHOVICH D， SOFIIUK K， GALEEV D， et al. IterDet： iterative scheme for object detection in crowded environments［C］// Proceedings of the 2020 Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition （SPR） and Structural and Syntactic Pattern Recognition （SSPR）， LNCS 12644. Cham： Springer， 2021： 344-354.
22	XU C， WANG J， YANG W， et al. RFLA： Gaussian receptive field based label assignment for tiny object detection［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13669. Cham： Springer， 2022： 526-543.
23	徐坚，谢正光，李洪均. 特征平衡的无人机航拍图像目标检测算法［J］. 计算机工程与应用， 2023， 59（6）：196-203.
	XU J， XIE Z G， LI H J. Feature-balanced UAV aerial image target detection algorithm［J］. Computer Engineering and Applications， 2023， 59（6）： 196-203.
24	ZHU X， SU W， LU L， et al. Deformable DETR： deformable Transformers for end-to-end object detection［EB/OL］. ［2022-11-14］..
25	ALBABA B M， OZER S. SyNet： an ensemble network for object detection in UAV images［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 10227-10234.

实验序号	SIBM空洞率			FDFPN空洞率			mAP/%
实验序号	1	2	3	｛1，2，3｝	｛1，3，5｝	｛1，5，7｝	mAP/%
1	√	×	×	√	×	×	26.7
2	×	√	×	×	√	×	27.0
3	×	×	√	×	×	√	26.3
4	×	√	×	√	×	×	26.8
5	×	√	×	×	×	√	26.5

实验序号	SIBM空洞率			FDFPN空洞率			mAP/%
实验序号	1	2	3	｛1，2，3｝	｛1，3，5｝	｛1，5，7｝	mAP/%
1	√	×	×	√	×	×	26.7
2	×	√	×	×	√	×	27.0
3	×	×	√	×	×	√	26.3
4	×	√	×	√	×	×	26.8
5	×	√	×	×	×	√	26.5

算法	SIBM	FDFPN	线性插值	mAP/%	AP_S/%	AP_M/%	AP_L/%	Params/MB
Swin‑T	×	×	×	23.1	15.8	33.7	36.2	38.6
A	√	×	×	24.3	16.9	34.4	36.7	39.5
B	×	√	×	24.6	17.1	35.6	38.2	42.1
C	√	√	×	27.0	19.4	37.0	41.3	47.4
D	√	√	√	27.2	19.5	37.4	41.4	47.4

算法	SIBM	FDFPN	线性插值	mAP/%	AP_S/%	AP_M/%	AP_L/%	Params/MB
Swin‑T	×	×	×	23.1	15.8	33.7	36.2	38.6
A	√	×	×	24.3	16.9	34.4	36.7	39.5
B	×	√	×	24.6	17.1	35.6	38.2	42.1
C	√	√	×	27.0	19.4	37.0	41.3	47.4
D	√	√	√	27.2	19.5	37.4	41.4	47.4

算法	AP/%										mAP/%	AP₅₀/%	AP₇₅/%	帧率/（frame·s^-1）
算法	行人	人	自行车	汽车	货车	卡车	三轮车	遮阳篷三轮车	公交车	摩托车	mAP/%	AP₅₀/%	AP₇₅/%	帧率/（frame·s^-1）
SSD^［20］	13.0	7.9	3.7	45.3	19.7	11.4	9.2	4.2	27.7	12.8	15.5	27.3	15.1	37.0
FPN^［4］	14.8	9.4	5.5	42.4	23.6	16.3	12.2	7.0	32.6	13.4	17.7	33.4	15.9	24.0
IterDeT^［21］	16.5	12.1	6.8	48.7	28.4	19.0	11.4	7.2	35.4	18.7	20.4	36.8	20.3	11.2
Faster R-CNN^［22］	20.9	14.8	7.3	51.0	30.2	19.8	14.0	8.1	35.5	21.1	22.3	39.0	21.7	25.4
YOLOv5s^［23］	16.2	8.2	7.2	50.6	31.4	27.9	14.4	14.1	41.4	15.7	22.7	40.8	22.5	90.0
Swin‑T	23.0	14.0	9.3	49.6	30.2	22.8	16.0	6.7	37.4	22.3	23.1	42.5	22.0	21.3
DDETR^［24］	25.5	14.1	10.6	53.0	36.9	25.2	15.3	6.7	38.3	22.4	24.8	42.7	25.1	19.7
SyNet^［25］	26.2	15.3	11.1	50.2	33.0	23.9	16.4	8.6	39.1	26.9	25.1	48.4	26.2	16.0
Swin-Det	28.8	18.6	12.4	54.7	35.4	25.1	19.2	9.1	42.2	26.8	27.2	50.7	28.3	18.4

基于空洞卷积融合Transformer的无人机图像小目标检测方法

Small target detection method in UAV images based on fusion of dilated convolution and Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 25

相关文章 15

编辑推荐

Metrics

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[3]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[4]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[5]	姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207.
[6]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[7]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[8]	韩贵金, 张馨渊, 张文涛, 黄娅. 基于多特征融合的自监督图像配准算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1597-1604.
[9]	李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.
[10]	封筠, 毕健康, 霍一儒, 李家宽. 轻量化沥青路面裂缝图像分割网络PIPNet[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1520-1526.
[11]	李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444.
[12]	贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 715-721.
[13]	吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744.
[14]	郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937.
[15]	李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751.