Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention
Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG
Journal of Computer Applications    2024, 44 (6): 1927-1934.   DOI: 10.11772/j.issn.1001-9081.2023060775
Abstract167)   HTML8)    PDF (3120KB)(584)       Save

In view of occlusion and lack of texture details of infrared targets in road scenes, which leads to false detection and missed detection, a lightweight infrared road scene detection YOLO (You Only Look Once) model based on Multi-Scale and weighted Coordinate attention (MSC-YOLO) was proposed. YOLOv7-tiny was taken as the baseline model. Firstly, a multi-scale pyramid module PSA (Pyramid Split Attention) was introduced in different intermediate feature layers of the MobileNetV3, and a lightweight backbone extraction network MSM-Net (Multi-Scale Mobile Network) for multi-scale feature extraction was designed to solve the problem of feature pollution caused by the fixed-size convolution kernel, improving the fine-grained extraction ability of targets of different scales. Secondly, Weighted Coordinate Attention (WCA) mechanism was integrated into the feature fusion network, and the target position information obtained from the vertical and horizontal spatial directions of the intermediate feature map was superimposed to enhance the fusion ability of target features in different dimensions. Finally, the positioning loss function was replaced to Efficient Intersection over Union (EIoU) to calculate the length and width influencing factors of the predicted frame and the real frame separately, accelerating the convergence. The verification experiment was carried out on the Flir dataset. Compared with the YOLOv7-tiny model, the number of parameters is reduced by 67.3%, the number of floating-point operations is reduced by 54.6%, and the model size is reduced by 60.5% under the premise that mAP(IoU=0.5) (mean Average Precision (IoU=0.5)) is only reduced by 0.7 percentage points. The Frames Per Second (FPS) reaches 101 on the RTA 2080Ti, achieving a balance between detection performance and lightweight, and meets the real-time detection requirements of infrared road scenes.

Table and Figures | Reference | Related Articles | Metrics