《计算机应用》唯一官方网站

• •    下一篇

基于改进DETR算法的小目标检测方法

吴俊,赵川   

  1. 成都理工大学 计算机与网络安全学院
  • 收稿日期:2025-03-18 修回日期:2025-05-08 发布日期:2025-05-16 出版日期:2025-05-16
  • 通讯作者: 赵川
  • 作者简介:吴俊(2000—),男,四川资阳人,硕士研究生,主要研究方向:计算机视觉;赵川(1967—),女,四川成都人,副教授,博士,主要研究方向:计算机视觉、自然语言处理。
  • 基金资助:
    四川省科技创新项目(24PYXM1008)。

Small object detection method based on improved DETR algorithm

WU Jun, ZHAO Chuan   

  1. College of Computer and Network Security, Chengdu University of Technology
  • Received:2025-03-18 Revised:2025-05-08 Online:2025-05-16 Published:2025-05-16
  • About author:WU Jun, born in 2000, M. S. candidate. His research interests include computer vision. ZHAO Chuan, born in 1967, Ph. D., associate professor. Her research interests include computer vison, natural language processing.
  • Supported by:
    Sichuan Provincial Science and Technology Innovation Project (24PYXM1008)

摘要: 针对DETR(DEtection Transformer)在小目标检测方面精度较低的问题,提出了一种基于改进DETR算法的小目标检测方法。首先,针对骨干网络ResNet-50在小目标特征提取方面提取能力弱、效率低、易丢失细节等问题,提出了一种结合多尺度注意力机制的改进MetaFormer作为DETR的骨干网络,增强模型对小目标的表征能力。其次,针对Transformer注意力模块在处理图像特征映射时存在收敛慢、特征空间分辨率受限等问题,引入了可变形注意力解码器,使模型能够聚焦于参考点周围的关键采样区域,从而加快模型收敛并提升小目标检测精度。最后,针对GIoU损失函数无法衡量预测框质量的问题,引入了WIoU(Wise-IoU) v3损失函数,为不同质量的预测框赋予差异化的梯度增益,引导模型收敛到更高的精度。在COCO2017目标检测数据集上的实验结果表明,相较于DETR,所提方法对小目标的平均检测精度提升了7.6个百分点,整体的平均检测精度提升了4.7个百分点,表明所提方法具有更高的检测精度。

关键词: DETR, 小目标, 可变形注意力, 多尺度注意力, WIoU v3

Abstract: To address the problem of low accuracy of DETR(DEtection Transformer) in small object detection, an improved DETR for small object detection was proposed. Firstly, an improved MetaFormer combined with a multi-scale attention mechanism was adopted as the backbone network, aiming to solve the problems of weak extraction ability, low efficiency, and detail loss in small object feature extraction of ResNet-50, thereby enhancing the representation capability for small objects. Secondly, a deformable attention decoder was introduced to address the problems of slow convergence and limited feature space resolution in the Transformer attention module when processing image feature maps. This enabled the model to focus on key sampling regions around reference points, accelerating convergence and improving detection accuracy for small objects. Finally, the Wise-IoU (WIoU) v3 loss function was incorporated to overcome the limitation of the GIoU loss function in evaluating prediction box quality. Differentiated gradient gains were assigned to prediction boxes of varying quality, guiding the model to converge towards higher accuracy. Experimental results on the COCO2017 object detection dataset showed that, compared with DETR, the proposed method improved the average precision for small objects by 7.6 percentage points and the overall average precision by 4.7 percentage points, demonstrating superior detection performance.

Key words: DEtection Transformer (DETR), small object, deformable attention, multi-scale attention, WIoU (Wise-IoU) v3

中图分类号: