《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3595-3602.DOI: 10.11772/j.issn.1001-9081.2023111575
收稿日期:
2023-11-16
修回日期:
2023-12-22
接受日期:
2023-12-22
发布日期:
2024-01-04
出版日期:
2024-11-10
通讯作者:
刘景亮
作者简介:
王林(1963—),男,江苏东台人,教授,博士,主要研究方向:计算机视觉基金资助:
Lin WANG1,2, Jingliang LIU2(), Wuwei WANG3
Received:
2023-11-16
Revised:
2023-12-22
Accepted:
2023-12-22
Online:
2024-01-04
Published:
2024-11-10
Contact:
Jingliang LIU
About author:
WANG Lin, born in 1963, Ph. D., professor. His research interests include computer vision.Supported by:
摘要:
针对无人机(UAV)航拍图像中目标场景复杂、目标尺度多样、小目标密集和目标遮挡严重的问题,提出一种多尺度空洞卷积的UAV图像目标检测算法Swin-Det。首先,采用Swin Transformer作为主干特征提取网络,并在主干网络中引入空间信息交融模块(SIBM),从而解决因物体间遮挡而导致的目标信息模糊的问题;其次,提出一种融合空洞特征金字塔网络(FDFPN),通过多分支的空洞卷积融合特征信息,以有效提高网络的感受野以及特征信息的复用,使模型可以学习到不同维度的细节特征;最后,采用线性插值法和多任务损失函数解决预测区域不匹配和样本不平衡的问题,提升模型的检测精度。在VisDrone数据集上的实验结果表明,Swin-Det算法的平均精度均值(mAP)达到了27.2%,与原始Swin Transformer相比,提高了4.1个百分点,且在同一训练批次下收敛更快。可见,Swin-Det算法可在复杂场景下实现对无人机图像目标的高精度检测。
中图分类号:
王林, 刘景亮, 王无为. 基于空洞卷积融合Transformer的无人机图像小目标检测方法[J]. 计算机应用, 2024, 44(11): 3595-3602.
Lin WANG, Jingliang LIU, Wuwei WANG. Small target detection method in UAV images based on fusion of dilated convolution and Transformer[J]. Journal of Computer Applications, 2024, 44(11): 3595-3602.
实验 序号 | SIBM空洞率 | FDFPN空洞率 | mAP/% | ||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | {1,2,3} | {1,3,5} | {1,5,7} | ||
1 | √ | × | × | √ | × | × | 26.7 |
2 | × | √ | × | × | √ | × | 27.0 |
3 | × | × | √ | × | × | √ | 26.3 |
4 | × | √ | × | √ | × | × | 26.8 |
5 | × | √ | × | × | × | √ | 26.5 |
表1 不同空洞率结果的对比
Tab. 1 Comparison of results with different dilation rates
实验 序号 | SIBM空洞率 | FDFPN空洞率 | mAP/% | ||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | {1,2,3} | {1,3,5} | {1,5,7} | ||
1 | √ | × | × | √ | × | × | 26.7 |
2 | × | √ | × | × | √ | × | 27.0 |
3 | × | × | √ | × | × | √ | 26.3 |
4 | × | √ | × | √ | × | × | 26.8 |
5 | × | √ | × | × | × | √ | 26.5 |
算法 | SIBM | FDFPN | 线性插值 | mAP/% | APS/% | APM/% | APL/% | Params/MB |
---|---|---|---|---|---|---|---|---|
Swin‑T | × | × | × | 23.1 | 15.8 | 33.7 | 36.2 | 38.6 |
A | √ | × | × | 24.3 | 16.9 | 34.4 | 36.7 | 39.5 |
B | × | √ | × | 24.6 | 17.1 | 35.6 | 38.2 | 42.1 |
C | √ | √ | × | 27.0 | 19.4 | 37.0 | 41.3 | 47.4 |
D | √ | √ | √ | 27.2 | 19.5 | 37.4 | 41.4 | 47.4 |
表2 消融实验结果对比
Tab.2 Comparison of ablation experimental results
算法 | SIBM | FDFPN | 线性插值 | mAP/% | APS/% | APM/% | APL/% | Params/MB |
---|---|---|---|---|---|---|---|---|
Swin‑T | × | × | × | 23.1 | 15.8 | 33.7 | 36.2 | 38.6 |
A | √ | × | × | 24.3 | 16.9 | 34.4 | 36.7 | 39.5 |
B | × | √ | × | 24.6 | 17.1 | 35.6 | 38.2 | 42.1 |
C | √ | √ | × | 27.0 | 19.4 | 37.0 | 41.3 | 47.4 |
D | √ | √ | √ | 27.2 | 19.5 | 37.4 | 41.4 | 47.4 |
算法 | AP/% | mAP/% | AP50/% | AP75/% | 帧率/(frame·s-1) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
行人 | 人 | 自行车 | 汽车 | 货车 | 卡车 | 三轮车 | 遮阳篷三轮车 | 公交车 | 摩托车 | |||||
SSD[ | 13.0 | 7.9 | 3.7 | 45.3 | 19.7 | 11.4 | 9.2 | 4.2 | 27.7 | 12.8 | 15.5 | 27.3 | 15.1 | 37.0 |
FPN[ | 14.8 | 9.4 | 5.5 | 42.4 | 23.6 | 16.3 | 12.2 | 7.0 | 32.6 | 13.4 | 17.7 | 33.4 | 15.9 | 24.0 |
IterDeT[ | 16.5 | 12.1 | 6.8 | 48.7 | 28.4 | 19.0 | 11.4 | 7.2 | 35.4 | 18.7 | 20.4 | 36.8 | 20.3 | 11.2 |
Faster R-CNN[ | 20.9 | 14.8 | 7.3 | 51.0 | 30.2 | 19.8 | 14.0 | 8.1 | 35.5 | 21.1 | 22.3 | 39.0 | 21.7 | 25.4 |
YOLOv5s[ | 16.2 | 8.2 | 7.2 | 50.6 | 31.4 | 27.9 | 14.4 | 14.1 | 41.4 | 15.7 | 22.7 | 40.8 | 22.5 | 90.0 |
Swin‑T | 23.0 | 14.0 | 9.3 | 49.6 | 30.2 | 22.8 | 16.0 | 6.7 | 37.4 | 22.3 | 23.1 | 42.5 | 22.0 | 21.3 |
DDETR[ | 25.5 | 14.1 | 10.6 | 53.0 | 36.9 | 25.2 | 15.3 | 6.7 | 38.3 | 22.4 | 24.8 | 42.7 | 25.1 | 19.7 |
SyNet[ | 26.2 | 15.3 | 11.1 | 50.2 | 33.0 | 23.9 | 16.4 | 8.6 | 39.1 | 26.9 | 25.1 | 48.4 | 26.2 | 16.0 |
Swin-Det | 28.8 | 18.6 | 12.4 | 54.7 | 35.4 | 25.1 | 19.2 | 9.1 | 42.2 | 26.8 | 27.2 | 50.7 | 28.3 | 18.4 |
表3 不同算法的检测结果对比
Tab.3 Comparison of detection results of different algorithms
算法 | AP/% | mAP/% | AP50/% | AP75/% | 帧率/(frame·s-1) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
行人 | 人 | 自行车 | 汽车 | 货车 | 卡车 | 三轮车 | 遮阳篷三轮车 | 公交车 | 摩托车 | |||||
SSD[ | 13.0 | 7.9 | 3.7 | 45.3 | 19.7 | 11.4 | 9.2 | 4.2 | 27.7 | 12.8 | 15.5 | 27.3 | 15.1 | 37.0 |
FPN[ | 14.8 | 9.4 | 5.5 | 42.4 | 23.6 | 16.3 | 12.2 | 7.0 | 32.6 | 13.4 | 17.7 | 33.4 | 15.9 | 24.0 |
IterDeT[ | 16.5 | 12.1 | 6.8 | 48.7 | 28.4 | 19.0 | 11.4 | 7.2 | 35.4 | 18.7 | 20.4 | 36.8 | 20.3 | 11.2 |
Faster R-CNN[ | 20.9 | 14.8 | 7.3 | 51.0 | 30.2 | 19.8 | 14.0 | 8.1 | 35.5 | 21.1 | 22.3 | 39.0 | 21.7 | 25.4 |
YOLOv5s[ | 16.2 | 8.2 | 7.2 | 50.6 | 31.4 | 27.9 | 14.4 | 14.1 | 41.4 | 15.7 | 22.7 | 40.8 | 22.5 | 90.0 |
Swin‑T | 23.0 | 14.0 | 9.3 | 49.6 | 30.2 | 22.8 | 16.0 | 6.7 | 37.4 | 22.3 | 23.1 | 42.5 | 22.0 | 21.3 |
DDETR[ | 25.5 | 14.1 | 10.6 | 53.0 | 36.9 | 25.2 | 15.3 | 6.7 | 38.3 | 22.4 | 24.8 | 42.7 | 25.1 | 19.7 |
SyNet[ | 26.2 | 15.3 | 11.1 | 50.2 | 33.0 | 23.9 | 16.4 | 8.6 | 39.1 | 26.9 | 25.1 | 48.4 | 26.2 | 16.0 |
Swin-Det | 28.8 | 18.6 | 12.4 | 54.7 | 35.4 | 25.1 | 19.2 | 9.1 | 42.2 | 26.8 | 27.2 | 50.7 | 28.3 | 18.4 |
1 | LIANG J, CHEN X, LIANG C, et al. A detection approach for late-autumn shoots of litchi based on Unmanned Aerial Vehicle (UAV) remote sensing[J]. Computers and Electronics in Agriculture, 2023, 204: No.107535. |
2 | SILVA L A, LEITHARDT V R Q, BATISTA V F L, et al. Automated road damage detection using UAV images and deep learning techniques[J]. IEEE Access, 2023, 11: 62918-62931. |
3 | XUE Y, JIN G, SHEN T, et al. Template-guided frequency attention and adaptive cross-entropy loss for UAV visual tracking[J]. Chinese Journal of Aeronautics, 2023, 36(9): 299-312. |
4 | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. |
5 | 刘英杰,杨风暴,胡鹏. 基于Cascade R-CNN的并行特征金字塔网络无人机航拍图像目标检测算法[J]. 激光与光电子学进展, 2020, 57(20): No.201505. |
LIU Y J, YANG F B, HU P. Parallel FPN algorithm based on Cascade R-CNN for object detection form UAV aerial images[J]. Laser and Optoelectronics Progress, 2020, 57(20): No.201505. | |
6 | HONG S, KANG S, CHO D. Patch-level augmentation for object detection in aerial images[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. Piscataway: IEEE, 2019: 127-134. |
7 | LIANG B, SU J, FENG K, et al. Cross-layer triple-branch parallel fusion network for small object detection in UAV images[J]. IEEE Access, 2023, 11: 39738-39750. |
8 | QIAO S, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10208-10219. |
9 | BEERY S, WU G, RATHOD V, et al. Context R-CNN: long term temporal context for per-camera object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13072-13082. |
10 | 田婷婷,杨军. 基于多尺度特征融合网络的遥感影像目标检测[J]. 激光与光电子学进展, 2022, 59(16): No.1628003. |
TIAN T T, YANG J. Object detection for remote sensing image based on multiscale feature fusion network[J]. Laser and Optoelectronics Progress, 2022, 59(16): No.1628003. | |
11 | YANG X, YANG J, YAN J, et al. SCRDet: towards more robust detection for small, cluttered and rotated objects[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8231-8240. |
12 | 田永林,王雨桐,王建功,等. 视觉Transformer研究的关键问题:现状及展望[J]. 自动化学报, 2022, 48(4):957-979. |
TIAN Y L, WANG Y T, WANG J G, et al. Key problems and progress of vision Transformers: the state of the art and prospects[J]. Acta Automatica Sinica, 2022, 48(4): 957-979. | |
13 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03) [2022-10-22].. |
14 | LIU Z, LIN Y, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. |
15 | XU Y, YANG Y, ZHANG L. DeMT: deformable mixer Transformer for multi-task learning of dense prediction[C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2023: 3072-3080. |
16 | HE X, ZHOU Y, ZHAO J, et al. Swin Transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: No.4408715. |
17 | JIANG X, WU Y. Remote sensing object detection based on convolution and Swin Transformer[J]. IEEE Access, 2023, 11: 38643-38656. |
18 | 付苗苗,邓淼磊,张德贤. 基于深度学习和Transformer的目标检测算法[J]. 计算机工程与应用, 2023, 59(1):37-48. |
FU M M, DENG M L, ZHANG D X. Object detection algorithms based on deep learning and Transformer[J]. Computer Engineering and Applications, 2023, 59(1): 37-48. | |
19 | WANG P, CHEN P, YUAN Y, et al. Understanding convolution for semantic segmentation[C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2018: 1451-1460. |
20 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
21 | RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: iterative scheme for object detection in crowded environments[C]// Proceedings of the 2020 Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), LNCS 12644. Cham: Springer, 2021: 344-354. |
22 | XU C, WANG J, YANG W, et al. RFLA: Gaussian receptive field based label assignment for tiny object detection[C]// Proceedings of the 2022 European Conference on Computer Vision, LNCS 13669. Cham: Springer, 2022: 526-543. |
23 | 徐坚,谢正光,李洪均. 特征平衡的无人机航拍图像目标检测算法[J]. 计算机工程与应用, 2023, 59(6):196-203. |
XU J, XIE Z G, LI H J. Feature-balanced UAV aerial image target detection algorithm[J]. Computer Engineering and Applications, 2023, 59(6): 196-203. | |
24 | ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. [2022-11-14].. |
25 | ALBABA B M, OZER S. SyNet: an ensemble network for object detection in UAV images[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021: 10227-10234. |
[1] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[2] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[3] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[4] | 刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257. |
[5] | 姬张建, 杜娜. 基于改进VariFocalNet的微小目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2200-2207. |
[6] | 刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977. |
[7] | 黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919. |
[8] | 韩贵金, 张馨渊, 张文涛, 黄娅. 基于多特征融合的自监督图像配准算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1597-1604. |
[9] | 李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628. |
[10] | 封筠, 毕健康, 霍一儒, 李家宽. 轻量化沥青路面裂缝图像分割网络PIPNet[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1520-1526. |
[11] | 李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444. |
[12] | 贾宗泽, 高鹏飞, 马应龙, 刘晓峰, 夏海鑫. 基于注意力机制的多特征融合对话行为层次化分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 715-721. |
[13] | 吴宁, 罗杨洋, 许华杰. 基于多尺度特征融合的遥感图像语义分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 737-744. |
[14] | 郑宇亮, 陈云华, 白伟杰, 陈平华. 融合事件数据和图像帧的车辆目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 931-937. |
[15] | 李新叶, 侯晔凝, 孔英会, 燕志旗. 结合特征融合与增强注意力的少样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 745-751. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||