Journal of Computer Applications

    Next Articles

UAV remote sensing image small object detection algorithm based on improved RT-DETR

  

  • Received:2026-01-28 Revised:2026-04-08 Online:2026-05-13 Published:2026-05-13

基于改进RT-DETR的无人机遥感图像小目标检测算法

徐小明,王浩森,张礼刚   

  1. 河北建筑工程学院
  • 通讯作者: 张礼刚

Abstract: To address the problems of sparse and easily lost features of small objects, high complexity of multi-scale feature fusion, and difficulty in balancing resource occupation and detection accuracy for object detection in Unmanned Aerial Vehicle (UAV) remote sensing images, a small object detection algorithm based on improved RT-DETR, namely ZY-DETR (Zoomed-Yield Detection Transformer) is proposed. A GradNet heterogeneous backbone network integrating shallow and deep layers is designed. The shallow layer adopts the cross-stage partial convolution (C2f) module to retain the edge texture details of small objects, while the deep-layer C2f-CGSA module fuses the Convolutional Gated Linear Unit (CGLU) and Single-Head Self-Attention (SHSA) to achieve lightweight local-to-global feature refinement. An inter-scale fusion module based on token statistical self-attention (IST-Fusion) is constructed, which compresses channel redundancy based on Adaptive Information Fusion and Interaction (AIFI), and replaces Multi-Head Self-Attention (MHSA) with the Token Statistical Self-Attention (TSSA) mechanism to realize efficient alignment of multi-scale features with linear computational complexity. A fine-grained detection head is designed, which directly connects high-resolution shallow features and skips destructive downsampling. Combined with a lightweight fusion unit and scale-specific detection branches, it balances the retention of detailed information and computational overhead. Under unified experimental conditions, ZY-DETR achieves an inference speed of 59.31 frame/s on the VisDrone2019 dataset, and the AP@[0.5:0.95] is improved from 20.3% of the RT-DETR baseline to 23.5%. The detection accuracy for small, medium, and large objects is increased by 3.0, 3.6, and 7.7 percentage points respectively. Test set validation on DOTA: 60.0% average precision, 85.61 frame/s, balanced precision improvement across all scales. The proposed algorithm significantly improves detection accuracy and effectively solves the problems of missed detection of small objects and low efficiency in multi-scale fusion for UAV remote sensing images.

摘要: 针对无人机遥感图像目标检测中小目标特征稀疏易丢失、多尺度融合复杂度高、资源占用与检测精度难以平衡的问题,提出基于改进RT-DETR的无人机遥感图像小目标检测算法ZY-DETR(Zoomed-Yield Detection Transformer)。设计GradNet异构深浅层骨干网络,浅层采用跨阶段双卷积特征融合模块(C2f)保留小目标边缘纹理细节,深层C2f-CGSA模块融合卷积门控线性单元(CGLU)与单头自注意力(SHSA),实现轻量化局部到全局特征精炼;构建IST-Fusion(Token Statistical Self-Attention based Inter-Scale Fusion)跨尺度融合模块,基于自适应信息融合(AIFI)压缩通道冗余,以令牌统计自注意力(TSSA)机制替换多头自注意力(MHSA),在线性计算复杂度下实现多尺度特征高效对齐;设计FineGrained-Detect精细粒度检测头,直连高分辨率浅层特征并跳过破坏性下采样,搭配轻量级融合单元与分尺度检测分支平衡细节保留与计算开销。统一实验条件下,ZY-DETR在VisDrone2019数据集上推理速度59.31帧/秒,平均精度AP@[0.5:0.95]从RT-DETR基线20.3%提升至23.5%,小、中、大目标检测平均精度分别提升3.0、3.6、7.7个百分点;在DOTA数据集测试集验证中平均精度60.0%、推理速度85.61帧/秒,各尺度精度均衡提升。该算法检测精度显著提升,可有效解决无人机遥感图像小目标漏检、多尺度融合效率低的问题。

CLC Number: