Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (12): 3922-3929.DOI: 10.11772/j.issn.1001-9081.2023121796
• Multimedia computing and computer simulation • Previous Articles Next Articles
Yudong PANG1, Zhixing LI1(), Weijie LIU1, Tianhao LI1, Ningning WANG2
Received:
2023-12-26
Revised:
2024-03-15
Accepted:
2024-03-18
Online:
2024-04-10
Published:
2024-12-10
Contact:
Zhixing LI
About author:
PANG Yudong, born in 1999, M. S. candidate. His research interests include computer vision, image recognition, object detection.Supported by:
庞玉东1, 李志星1(), 刘伟杰1, 李天昊1, 王宁宁2
通讯作者:
李志星
作者简介:
庞玉东(1999—),男,山东枣庄人,硕士研究生,CCF会员,主要研究方向:计算机视觉、图像识别、目标检测基金资助:
CLC Number:
Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer[J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
庞玉东, 李志星, 刘伟杰, 李天昊, 王宁宁. 基于改进实时检测Transformer的塔机上俯视场景小目标检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3922-3929.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023121796
重参数化 | FasterNet Block | SIoU | Precision | Recall | mAP50 | mAP50-90 | FPS | GFLOPs |
---|---|---|---|---|---|---|---|---|
0.886 | 0.882 | 0.894 | 0.513 | 49.3 | 63.3 | |||
√ | 0.888 | 0.869 | 0.887 | 0.482 | 60.6 | 22.5 | ||
√ | 0.923 | 0.882 | 0.907 | 0.510 | 50.4 | 63.2 | ||
√ | 0.931 | 0.885 | 0.918 | 0.529 | 50.1 | 65.0 | ||
√ | √ | 0.914 | 0.882 | 0.906 | 0.516 | 54.9 | 25.1 | |
√ | √ | 0.902 | 0.891 | 0.897 | 0.516 | 51.8 | 24.9 | |
√ | √ | 0.940 | 0.873 | 0.908 | 0.513 | 52.8 | 49.5 | |
√ | √ | √ | 0.947 | 0.897 | 0.917 | 0.534 | 59.7 | 28.5 |
Tab. 1 Ablation experiment results of three improvements on RT-DETR model
重参数化 | FasterNet Block | SIoU | Precision | Recall | mAP50 | mAP50-90 | FPS | GFLOPs |
---|---|---|---|---|---|---|---|---|
0.886 | 0.882 | 0.894 | 0.513 | 49.3 | 63.3 | |||
√ | 0.888 | 0.869 | 0.887 | 0.482 | 60.6 | 22.5 | ||
√ | 0.923 | 0.882 | 0.907 | 0.510 | 50.4 | 63.2 | ||
√ | 0.931 | 0.885 | 0.918 | 0.529 | 50.1 | 65.0 | ||
√ | √ | 0.914 | 0.882 | 0.906 | 0.516 | 54.9 | 25.1 | |
√ | √ | 0.902 | 0.891 | 0.897 | 0.516 | 51.8 | 24.9 | |
√ | √ | 0.940 | 0.873 | 0.908 | 0.513 | 52.8 | 49.5 | |
√ | √ | √ | 0.947 | 0.897 | 0.917 | 0.534 | 59.7 | 28.5 |
模型 | Precision | Recall | mAP50 | mAP50-90 |
---|---|---|---|---|
YOLOv5 | 0.841 | 0.808 | 0.725 | 0.419 |
YOLOv7 | 0.859 | 0.812 | 0.731 | 0.430 |
YOLOv8 | 0.872 | 0.828 | 0.752 | 0.431 |
DETR | 0.860 | 0.870 | 0.885 | 0.493 |
RT-DETR | 0.886 | 0.882 | 0.894 | 0.513 |
本文模型 | 0.947 | 0.897 | 0.917 | 0.534 |
Tab. 2 Comparison results of detection performance of different detection models on OP dataset
模型 | Precision | Recall | mAP50 | mAP50-90 |
---|---|---|---|---|
YOLOv5 | 0.841 | 0.808 | 0.725 | 0.419 |
YOLOv7 | 0.859 | 0.812 | 0.731 | 0.430 |
YOLOv8 | 0.872 | 0.828 | 0.752 | 0.431 |
DETR | 0.860 | 0.870 | 0.885 | 0.493 |
RT-DETR | 0.886 | 0.882 | 0.894 | 0.513 |
本文模型 | 0.947 | 0.897 | 0.917 | 0.534 |
模型 | AP | AP50 | AP75 |
---|---|---|---|
YOLOv5 | 0.444 | 0.631 | 0.456 |
YOLOv7 | 0.453 | 0.648 | 0.483 |
YOLOv8 | 0.455 | 0.639 | 0.488 |
DETR | 0.435 | 0.618 | 0.442 |
RT-DETR | 0.459 | 0.652 | 0.498 |
本文模型 | 0.468 | 0.662 | 0.510 |
Tab. 3 Comparison of detection performance of different detection models on COCO dataset
模型 | AP | AP50 | AP75 |
---|---|---|---|
YOLOv5 | 0.444 | 0.631 | 0.456 |
YOLOv7 | 0.453 | 0.648 | 0.483 |
YOLOv8 | 0.455 | 0.639 | 0.488 |
DETR | 0.435 | 0.618 | 0.442 |
RT-DETR | 0.459 | 0.652 | 0.498 |
本文模型 | 0.468 | 0.662 | 0.510 |
1 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. |
2 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149. |
3 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-12-10].. |
4 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-10-20].. |
5 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. |
6 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. |
7 | GE Z, LIU S, WANG F, et al. YOLOX: exceeding YOLO series in 2021[EB/OL]. [2023-11-18].. |
8 | SALSCHEIDER N O. FeatureNMS: non-maximum suppression by learning feature embeddings[C]// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE, 2021:7848-7854. |
9 | LIU S, HUANG D, WANG Y. Adaptive NMS: refining pedestrian detection in a crowd[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6452-6461. |
10 | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12993-13000. |
11 | 周飞燕,金林鹏,董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(6):1229-1251. |
ZHOU F Y, JIN L P, DONG J. Review of convolutional neural networks [J]. Chinese Journal of Computers, 2017, 40(6):1229-1251. | |
12 | DING X, ZHANG X, HAN J, et al. Diverse branch block: building a convolution as an inception-like unit[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10881-10890. |
13 | BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [EB/OL]. [2023-10-21].. |
14 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
15 | 陈洛轩,林成创,郑招良,等. Transformer在计算机视觉场景下的研究综述[J]. 计算机科学, 2023, 50(12):130-147. |
CHEN L X, LIN C C, ZHENG Z L, et al. Review of Transformer in computer vision[J]. Computer Science, 2023, 50(12):130-147. | |
16 | CHEN C F, PANDA R, FAN Q. RegionViT: regional-to-local attention for vision Transformers [EB/OL]. [2023-10-20].. |
17 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 213-229. |
18 | ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. [2023-12-22].. |
19 | CHEN Q, CHEN X, WANG J, et al. Group DETR: fast DETR training with group-wise one-to-many assignment [C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 6633-6642. |
20 | ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 16965-16974. |
21 | CHEN J, KAO S H, HE H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 12021-12031. |
22 | ZHANG H, XU C, ZHANG S. Inner-IoU: more effective intersection over union loss with auxiliary bounding box [EB/OL]. [2023-12-20].. |
23 | GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. [2023-12-20].. |
24 | KUHN H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics, 2005, 52(1): 7-21. |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[3] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[4] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[5] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[6] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[7] | Jiepo FANG, Chongben TAO. Hybrid internet of vehicles intrusion detection system for zero-day attacks [J]. Journal of Computer Applications, 2024, 44(9): 2763-2769. |
[8] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[9] | Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902. |
[10] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[11] | Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380. |
[12] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[13] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[14] | Yuwei DING, Hongbo SHI, Jie LI, Min LIANG. Image denoising network based on local and global feature decoupling [J]. Journal of Computer Applications, 2024, 44(8): 2571-2579. |
[15] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||