Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (2): 564-571.DOI: 10.11772/j.issn.1001-9081.2025030277
• Multimedia computing and computer simulation • Previous Articles
Jun WU, Chuan ZHAO(
)
Received:2025-03-21
Revised:2025-05-08
Accepted:2025-05-09
Online:2025-05-16
Published:2026-02-10
Contact:
Chuan ZHAO
About author:WU Jun, born in 2000, M. S. candidate. His research interests include computer vision.Supported by:通讯作者:
赵川
作者简介:吴俊(2000—),男,四川资阳人,硕士研究生,主要研究方向:计算机视觉基金资助:CLC Number:
Jun WU, Chuan ZHAO. Small object detection method based on improved DETR algorithm[J]. Journal of Computer Applications, 2026, 46(2): 564-571.
吴俊, 赵川. 基于改进DETR算法的小目标检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 564-571.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025030277
| 阶段 | 模块 | 堆叠数 | 输出尺寸 |
|---|---|---|---|
| 1 | EmaFormer | 3 | 32×160×160 |
| 2 | EmaFormer | 3 | 64×80×80 |
| 3 | EmaFormer | 9 | 128×40×40 |
| 4 | EmaFormer | 3 | 256×20×20 |
Tab. 1 Parameters of improved MetaFormer backbone network
| 阶段 | 模块 | 堆叠数 | 输出尺寸 |
|---|---|---|---|
| 1 | EmaFormer | 3 | 32×160×160 |
| 2 | EmaFormer | 3 | 64×80×80 |
| 3 | EmaFormer | 9 | 128×40×40 |
| 4 | EmaFormer | 3 | 256×20×20 |
| 参数 | 设置 | 参数 | 设置 |
|---|---|---|---|
| Epochs | 75 | Optimizer | AdamW |
| Batch_size | 4 | 3.0 | |
| Learning rate | 0.000 1 | 1.9 | |
| Weight decay | 0.000 1 |
Tab. 2 Key parameters for training improved DETR model
| 参数 | 设置 | 参数 | 设置 |
|---|---|---|---|
| Epochs | 75 | Optimizer | AdamW |
| Batch_size | 4 | 3.0 | |
| Learning rate | 0.000 1 | 1.9 | |
| Weight decay | 0.000 1 |
| 实验编号 | ResNet-50骨干网络 | MetaFormer骨干网络 | 改进的MetaFormer骨干网络 | 注意力解码器 | 可变形注意力解码器 | DETR原损失函数 | 优化的损失函数 | AP/% | ||
|---|---|---|---|---|---|---|---|---|---|---|
| APS | AP50 | AP50:95 | ||||||||
| 1 | √ | √ | √ | 22.5 | 63.1 | 43.3 | ||||
| 2 | √ | √ | √ | 25.9 | 61.3 | 44.5 | ||||
| 3 | √ | √ | √ | 27.1 | 63.1 | 46.1 | ||||
| 4 | √ | √ | √ | 29.3 | 64.8 | 47.2 | ||||
| 5 | √ | √ | √ | 30.1 | 65.9 | 48.0 | ||||
Tab. 3 Ablation experimental results
| 实验编号 | ResNet-50骨干网络 | MetaFormer骨干网络 | 改进的MetaFormer骨干网络 | 注意力解码器 | 可变形注意力解码器 | DETR原损失函数 | 优化的损失函数 | AP/% | ||
|---|---|---|---|---|---|---|---|---|---|---|
| APS | AP50 | AP50:95 | ||||||||
| 1 | √ | √ | √ | 22.5 | 63.1 | 43.3 | ||||
| 2 | √ | √ | √ | 25.9 | 61.3 | 44.5 | ||||
| 3 | √ | √ | √ | 27.1 | 63.1 | 46.1 | ||||
| 4 | √ | √ | √ | 29.3 | 64.8 | 47.2 | ||||
| 5 | √ | √ | √ | 30.1 | 65.9 | 48.0 | ||||
| 模型 | 参数量/106 | AP/% | |||||
|---|---|---|---|---|---|---|---|
| AP50:95 | AP50 | AP75 | APS | APM | APL | ||
| DETR[ | 41 | 43.3 | 63.1 | 45.9 | 22.5 | 47.3 | 61.1 |
| Conditional-DETR | 44 | 45.0 | 65.4 | 48.5 | 25.3 | 49.9 | 62.2 |
| Anchor-DETR | 37 | 44.2 | 64.7 | 47.5 | 24.7 | 48.2 | 60.6 |
| Deformable-DETR[ | 40 | 46.2 | 65.2 | 50.0 | 28.8 | 49.2 | 61.7 |
| DN-DETR | 48 | 46.3 | 66.4 | 49.7 | 26.7 | 50.0 | 64.3 |
| DAB-DETR | 48 | 44.5 | 65.1 | 47.7 | 25.3 | 48.2 | 62.3 |
| Efficient DETR | 35 | 45.1 | 63.1 | 49.1 | 28.3 | 48.4 | 59.0 |
| SMCA-DETR | 40 | 45.6 | 65.5 | 49.1 | 25.9 | 49.3 | 62.6 |
| TSP-FCOS | — | 43.1 | 62.3 | 47.0 | 26.6 | 46.8 | 55.9 |
| TSP-RCNN | — | 43.8 | 63.3 | 48.3 | 28.6 | 46.9 | 55.7 |
| Sparse DETR | 41 | 46.3 | 66.0 | 50.1 | 29.0 | 49.5 | 60.8 |
| SAM-DETR | 58 | 45.0 | 65.4 | 47.9 | 26.2 | 49.0 | 63.3 |
| 本文模型 | 45 | 48.0 | 65.9 | 51.8 | 30.1 | 52.9 | 66.0 |
Tab. 4 Results of horizontal comparison experiments
| 模型 | 参数量/106 | AP/% | |||||
|---|---|---|---|---|---|---|---|
| AP50:95 | AP50 | AP75 | APS | APM | APL | ||
| DETR[ | 41 | 43.3 | 63.1 | 45.9 | 22.5 | 47.3 | 61.1 |
| Conditional-DETR | 44 | 45.0 | 65.4 | 48.5 | 25.3 | 49.9 | 62.2 |
| Anchor-DETR | 37 | 44.2 | 64.7 | 47.5 | 24.7 | 48.2 | 60.6 |
| Deformable-DETR[ | 40 | 46.2 | 65.2 | 50.0 | 28.8 | 49.2 | 61.7 |
| DN-DETR | 48 | 46.3 | 66.4 | 49.7 | 26.7 | 50.0 | 64.3 |
| DAB-DETR | 48 | 44.5 | 65.1 | 47.7 | 25.3 | 48.2 | 62.3 |
| Efficient DETR | 35 | 45.1 | 63.1 | 49.1 | 28.3 | 48.4 | 59.0 |
| SMCA-DETR | 40 | 45.6 | 65.5 | 49.1 | 25.9 | 49.3 | 62.6 |
| TSP-FCOS | — | 43.1 | 62.3 | 47.0 | 26.6 | 46.8 | 55.9 |
| TSP-RCNN | — | 43.8 | 63.3 | 48.3 | 28.6 | 46.9 | 55.7 |
| Sparse DETR | 41 | 46.3 | 66.0 | 50.1 | 29.0 | 49.5 | 60.8 |
| SAM-DETR | 58 | 45.0 | 65.4 | 47.9 | 26.2 | 49.0 | 63.3 |
| 本文模型 | 45 | 48.0 | 65.9 | 51.8 | 30.1 | 52.9 | 66.0 |
| [1] | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 213-229. |
| [2] | YU W, SI C, ZHOU P, et al. MetaFormer baselines for vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(2): 896-912. |
| [3] | OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning[C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5. |
| [4] | ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection[EB/OL]. [2024-10-13].. |
| [5] | TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. [2024-10-02].. |
| [6] | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. |
| [7] | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
| [8] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
| [9] | REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. |
| [10] | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-12-25].. |
| [11] | BOCHKOVSKIY A, WANG C Y, MARK LIAO H Y. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2020-04-23].. |
| [12] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
| [13] | DENG C, WANG M, LIU L, et al. Extended feature pyramid network for small object detection[J]. IEEE Transactions on Multimedia, 2022, 24: 1968-1979. |
| [14] | LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]// Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication. Piscataway: IEEE, 2021: 181-186. |
| [15] | CUI L, LV P, JIANG X, et al. Context-aware block net for small object detection[J]. IEEE Transactions on Cybernetics, 2022, 52(4): 2300-2313. |
| [16] | WANG G, CHEN Y, AN P, et al. UAV-YOLOv8: a small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios[J]. Sensors, 2023, 23(16): No.7190. |
| [17] | LENG J, REN Y, JIANG W, et al. Realize your surroundings: exploiting context information for small object detection[J]. Neurocomputing, 2021, 433: 287-299. |
| [18] | LIU H, SUN F, GU J, et al. SF-YOLOv5: a lightweight small object detection algorithm based on improved feature fusion mode[J]. Sensors, 2022, 22(15): No.5817. |
| [19] | MIN K, LEE G H, LEE S W. Attentional feature pyramid network for small object detection[J]. Neural Networks, 2022, 155: 439-450. |
| [20] | TANG S, ZHANG S, FANG Y. HIC-YOLOv5: improved YOLOv5 for small object detection[C]// Proceedings of the 2024 IEEE International Conference on Robotics and Automation. Piscataway: IEEE, 2024: 6614-6619. |
| [21] | JING R, ZHANG W, LIU Y, et al. An effective method for small object detection in low-resolution images[J]. Engineering Applications of Artificial Intelligence, 2024, 127(Pt A): No.107206. |
| [22] | TONG K, WU Y. Small object detection using deep feature learning and feature fusion network[J]. Engineering Applications of Artificial Intelligence, 2024, 132: No.107931. |
| [23] | LI L, LI B, ZHOU H. Lightweight multi-scale network for small object detection[J]. PeerJ Computer Science, 2022, 8: No.e1145. |
| [24] | YAN B, LI J, YANG Z, et al. AIE-YOLO: auxiliary information enhanced YOLO for small object detection[J]. Sensors, 2022, 22(21): No.8221. |
| [25] | HUANG S, LIU Q. Addressing scale imbalance for small object detection with dense detector[J]. Neurocomputing, 2022, 473: 68-78. |
| [26] | WANG M, YANG W, WANG L, et al. FE-YOLOv5: feature enhancement network based on YOLOv5 for small object detection[J]. Journal of Visual Communication and Image Representation, 2023, 90: No.103752. |
| [27] | JI S J, LING Q H, HAN F. An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information[J]. Computers and Electrical Engineering, 2023, 105: No.108490. |
| [28] | HAO C, ZHANG H, SONG W, et al. SliNet: slicing-aided learning for small object detection[J]. IEEE Signal Processing Letters, 2024, 31:790-794. |
| [29] | ZHANG X, LU T, WANG J, et al. Small object detection by edge-aware neural network[J]. Engineering Applications of Artificial Intelligence, 2024, 138(Pt B): No.109406. |
| [30] | WANG L, ZHOU Z, SHI G, et al. Small object detection based on bidirectional feature fusion and multi-scale distillation[C]// Proceedings of the 2024 International Conference on Artificial Neural Networks, LNCS 15017. Cham: Springer, 2024: 200-214. |
| [31] | LI X, LI X, TAN H, et al. SAMF: small-area-aware multi-focus image fusion for object detection[C]// Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2024: 3845-3849. |
| [32] | LIU S, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. [2025-04-03].. |
| [33] | MENG D, CHEN X, FAN Z, et al. Conditional DETR for fast training convergence[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3631-3640. |
| [34] | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| [35] | WOO S, DEBNATH S, HU R, et al. ConvNeXt V2: co-designing and scaling ConvNets with masked autoencoders[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 16133-16142. |
| [36] | ZHU L, WANG X, KE Z, et al. BiFormer: Vision Transformer with bi-level routing attention[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10323-10333. |
| [37] | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. |
| [38] | WANG Y, ZHANG X, YANG T, et al. Anchor DETR: query design for transformer-based detector[C]// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2022: 2567-2575. |
| [39] | LI F, ZHANG H, LIU S, et al. DN-DETR: accelerate DETR training by introducing query denoising[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 13609-13617. |
| [40] | YAO Z, AI J, LI B, et al. Efficient DETR: improving end-to-end object detector with dense prior[EB/OL]. [2025-04-03].. |
| [41] | GAO P, ZHENG M, WANG X, et al. Fast convergence of DETR with spatially modulated co-attention[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3601-3610. |
| [42] | SUN Z, CAO S, YANG Y, et al. Rethinking Transformer-based set prediction for object detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 3591-3600. |
| [43] | ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity[EB/OL]. [2025-01-13].. |
| [44] | ZHANG G, LUO Z, YU Y, et al. Accelerating DETR convergence via semantic-aligned matching[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 939-948. |
| [1] | Liang CHEN, Xuan WANG, Kun LEI. Helmet wearing detection algorithm for complex scenarios based on cross-layer multi-scale feature fusion [J]. Journal of Computer Applications, 2025, 45(7): 2333-2341. |
| [2] | Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654. |
| [3] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
| [4] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
| [5] | Wudan LONG, Bo PENG, Jie HU, Ying SHEN, Danni DING. Road damage detection algorithm based on enhanced feature extraction [J]. Journal of Computer Applications, 2024, 44(7): 2264-2270. |
| [6] | Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977. |
| [7] | Tao LIU, Shihong JU, Yimeng GAO. Small object detection algorithm from drone perspective based on improved YOLOv8n [J]. Journal of Computer Applications, 2024, 44(11): 3603-3609. |
| [8] | Zongzhe LYU, Hui XU, Xiao YANG, Yong WANG, Weijian WANG. Small object detection algorithm of YOLOv5 for safety helmet [J]. Journal of Computer Applications, 2023, 43(6): 1943-1949. |
| [9] | Qiangqiang QIN, Junguo LIAO, Yixun ZHOU. Small object detection algorithm based on split mixed attention [J]. Journal of Computer Applications, 2023, 43(11): 3579-3586. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||