Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (2): 564-571.DOI: 10.11772/j.issn.1001-9081.2025030277

• Multimedia computing and computer simulation • Previous Articles    

Small object detection method based on improved DETR algorithm

Jun WU, Chuan ZHAO()   

  1. College of Computer Science and Cyber Security,Chengdu University of Technology,Chengdu Sichuan 610059,China
  • Received:2025-03-21 Revised:2025-05-08 Accepted:2025-05-09 Online:2025-05-16 Published:2026-02-10
  • Contact: Chuan ZHAO
  • About author:WU Jun, born in 2000, M. S. candidate. His research interests include computer vision.
    ZHAO Chuan, born in 1967, Ph. D., associate professor. Her research interests include computer vision, natural language processing. Email:zhaoc@cdut.edu.cn
  • Supported by:
    Sichuan Provincial Science and Technology Innovation Project(24PYXM1008)

基于改进DETR算法的小目标检测方法

吴俊, 赵川()   

  1. 成都理工大学 计算机与网络安全学院,成都 610059
  • 通讯作者: 赵川
  • 作者简介:吴俊(2000—),男,四川资阳人,硕士研究生,主要研究方向:计算机视觉
    赵川(1967—),女,四川成都人,副教授,博士,主要研究方向:计算机视觉、自然语言处理。 Email:zhaoc@cdut.edu.cn
  • 基金资助:
    四川省科技创新项目(24PYXM1008)

Abstract:

To address the problem of low accuracy of DETR (DEtection TRansformer) in small object detection, a small object detection method based on improved DETR algorithm was proposed. Firstly, an improved MetaFormer combined with a multi-scale attention mechanism was adopted as the backbone network, aiming to solve the problems of weak extraction ability, low efficiency, and detail loss in small object feature extraction of backbone network ResNet-50, thereby enhancing the model’s representation capability for small objects. Secondly, a deformable attention decoder was introduced to solve the problems of slow convergence and limited feature space resolution in the Transformer attention module when processing image feature maps, so that the model was able to focus on key sampling regions around reference points, thereby accelerating the model convergence and improving detection accuracy for small objects. Finally, the Wise-IoU (WIoU) v3 loss function was incorporated for inability of the GIoU (Generalized Intersection over Union) loss function in evaluating prediction box quality, so that differentiated gradient gains were assigned to prediction boxes of varying qualities, thereby guiding the model to converge towards higher accuracy. Experimental results on the COCO2017 object detection dataset show that compared with DETR, the proposed method improves the average precision for small objects by 7.6 percentage points and the overall average precision by 4.7 percentage points, demonstrating superior detection precision of the proposed method.

Key words: DETR (DEtection TRansformer), small object, deformable attention, multi-scale attention, WIoU (Wise-IoU) v3

摘要:

针对DETR(DEtection TRansformer)在小目标检测方面精度较低的问题,提出一种基于改进DETR算法的小目标检测方法。首先,针对骨干网络ResNet-50在小目标特征提取方面提取能力弱、效率低以及易丢失细节等问题,使用一种结合多尺度注意力机制的改进MetaFormer作为DETR的骨干网络,从而增强模型对小目标的表征能力;其次,针对Transformer注意力模块在处理图像特征映射时存在的收敛慢和特征空间分辨率受限等问题,引入可变形注意力解码器,从而使模型能够聚焦于参考点周围的关键采样区域,进而加快模型收敛并提升小目标的检测精度;最后,针对GIoU(Generalized Intersection over Union)损失函数无法衡量预测框质量的问题,引入WIoU (Wise-IoU) v3损失函数,从而为不同质量的预测框赋予差异化的梯度增益,进而引导模型收敛到更高的精度。在COCO2017目标检测数据集上的实验结果表明,相较于DETR,所提方法对小目标的平均检测精度提升了7.6个百分点,整体的平均检测精度提升了4.7个百分点。可见,所提方法具有更高的检测精度。

关键词: DETR, 小目标, 可变形注意力, 多尺度注意力, WIoU v3

CLC Number: