Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (12): 3922-3929.DOI: 10.11772/j.issn.1001-9081.2023121796

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer

Yudong PANG1, Zhixing LI1(), Weijie LIU1, Tianhao LI1, Ningning WANG2   

  1. 1.School of Mechanical,Electrical and Vehicle Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102616,China
    2.School of Intelligent Manufacturing,Luoyang Institute of Science and Technology,Luoyang Henan 471023,China
  • Received:2023-12-26 Revised:2024-03-15 Accepted:2024-03-18 Online:2024-04-10 Published:2024-12-10
  • Contact: Zhixing LI
  • About author:PANG Yudong, born in 1999, M. S. candidate. His research interests include computer vision, image recognition, object detection.
    LIU Weijie, born in 2000, M. S. candidate. His research interests include computer vision, artificial intelligence.
    LI Tianhao, born in 1998, M. S. candidate. His research interests include computer vision, artificial intelligence.
    WANG Ningning, born in 1986, Ph. D., lecturer. His research interests include image recognition, magnetorheological fluid transmission.
  • Supported by:
    Fundamental Research Funds for Universities in Beijing(X21053);Key Scientific Research Project of Higher Education Institutions of Henan Province(23A460020);Natural Science Foundation of Henan Province(242300420044)

基于改进实时检测Transformer的塔机上俯视场景小目标检测模型

庞玉东1, 李志星1(), 刘伟杰1, 李天昊1, 王宁宁2   

  1. 1.北京建筑大学 机电与车辆工程学院,北京 102616
    2.洛阳理工学院 智能制造学院,河南 洛阳 471023
  • 通讯作者: 李志星
  • 作者简介:庞玉东(1999—),男,山东枣庄人,硕士研究生,CCF会员,主要研究方向:计算机视觉、图像识别、目标检测
    刘伟杰(2000—),男,吉林榆树人,硕士研究生,主要研究方向:计算机视觉、人工智能
    李天昊(1998—),男,北京人,硕士研究生,主要研究方向:计算机视觉、人工智能
    王宁宁(1986—),男,河南洛阳人,讲师,博士,主要研究方向:图像识别、磁流变液传动。
  • 基金资助:
    北京市属高校基本科研业务费资助项目(X21053);河南省高等学校重点科研项目(23A460020);河南省自然科学基金资助项目(242300420044)

Abstract:

In view of a series of problems of security guarantee of construction site personnel such as casualties led by falling objects and tower crane collapse caused by mutual collision of tower hooks, a small target detection model in overlooking scenes on tower cranes based on improved Real-Time DEtection TRansformer (RT-DETR) was proposed. Firstly, the multiple training and single inference structures designed by applying the idea of model reparameterization were added to the original model to improve the detection speed. Secondly, the convolution module in FasterNet Block was redesigned to replace BasicBlock in the original BackBone to improve performance of the detection model. Thirdly, the new loss function Inner-SIoU (Inner-Structured Intersection over Union) was utilized to further improve precision and convergence speed of the model. Finally, the ablation and comparison experiments were conducted to verify the model performance. The results show that, in detection of the small target images in overlooking scenes on tower cranes, the proposed model achieves the precision of 94.7%, which is higher than that of the original RT-DETR model by 6.1 percentage points. At the same time, the Frames Per Second (FPS) of the proposed model reaches 59.7, and the detection speed is improved by 21% compared with the original model. The Average Precision (AP) of the proposed model on the public dataset COCO 2017 is 2.4, 1.5, and 1.3 percentage points higher than those of YOLOv5, YOLOv7, and YOLOv8, respectively. It can be seen that the proposed model meets the precision and speed requirements for small target detection in overlooking scenes on tower cranes.

Key words: target detection, RT-DETR (Real-Time DEtection TRansformer), small target, Transformer, computer vision, attention mechanism

摘要:

针对塔机吊钩相互碰撞导致物体跌落以及塔机倒塌致使人员伤亡等一系列施工现场人员安全保障的问题,提出一种基于改进实时检测Transformer (Real-Time DEtection TRansformer, RT-DETR)的塔机上俯视场景小目标检测模型。首先,在原始模型中加入应用模型的重参数化思想设计的多路训练和单路推理结构以提升检测速度;其次,重新设计FasterNet Block中的卷积模块替换原始BackBone之中的BasicBlock以提升检测模型性能;再次,利用新的损失函数Inner-SIoU(Inner-Structured Intersection over Union)进一步提升模型精度与收敛速度;最后,进行消融实验与对比实验验证模型性能。结果表明,在检测塔机顶部俯视小目标图像时,所提模型的精度达到94.7%,高于原始RT-DETR模型6.1个百分点;所提模型的每秒检测帧数(FPS)达到59.7,检测速度相较于原模型提升了21%。在公共数据集COCO 2017上所提模型的平均精度(AP)比YOLOv5、YOLOv7和YOLOv8分别高2.4、1.5和1.3个百分点。可见所提模型满足塔机上俯视场景下的小目标检测精度和速度的要求。

关键词: 目标检测, RT-DETR, 小目标, Transformer, 计算机视觉, 注意力机制

CLC Number: