In view of a series of problems of security guarantee of construction site personnel such as casualties led by falling objects and tower crane collapse caused by mutual collision of tower hooks, a small target detection model in overlooking scenes on tower cranes based on improved Real-Time DEtection TRansformer (RT-DETR) was proposed. Firstly, the multiple training and single inference structures designed by applying the idea of model reparameterization were added to the original model to improve the detection speed. Secondly, the convolution module in FasterNet Block was redesigned to replace BasicBlock in the original BackBone to improve performance of the detection model. Thirdly, the new loss function Inner-SIoU (Inner-Structured Intersection over Union) was utilized to further improve precision and convergence speed of the model. Finally, the ablation and comparison experiments were conducted to verify the model performance. The results show that, in detection of the small target images in overlooking scenes on tower cranes, the proposed model achieves the precision of 94.7%, which is higher than that of the original RT-DETR model by 6.1 percentage points. At the same time, the Frames Per Second (FPS) of the proposed model reaches 59.7, and the detection speed is improved by 21% compared with the original model. The Average Precision (AP) of the proposed model on the public dataset COCO 2017 is 2.4, 1.5, and 1.3 percentage points higher than those of YOLOv5, YOLOv7, and YOLOv8, respectively. It can be seen that the proposed model meets the precision and speed requirements for small target detection in overlooking scenes on tower cranes.