Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1927-1934.DOI: 10.11772/j.issn.1001-9081.2023060775

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

Xiaohui CHENG1,2, Yuntian HUANG1, Ruifang ZHANG3()   

  1. 1.College of Computer Science and Engineering,Guilin University of Technology,Guilin Guangxi 541006,China
    2.Guangxi Key Laboratory of Embedded Technology and Intelligent System (Guilin University of Technology),Guilin Guangxi 541006,China
    3.College of Mechanical and Control Engineering,Guilin University of Technology,Guilin Guangxi 541006,China
  • Received:2023-06-20 Revised:2023-09-11 Accepted:2023-09-12 Online:2023-09-27 Published:2024-06-10
  • Contact: Ruifang ZHANG
  • About author:CHENG Xiaohui, born in 1961, professor. His research interests include embedded systems, IoT, artificial intelligence.
    HUANG Yuntian, born in 1999, M. S. candidate. His research interests include embedded systems, object detection.
  • Supported by:
    Guangxi Innovation Driven Development Special Fund(Guike AA18118009)

基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型

程小辉1,2, 黄云天1, 张瑞芳3()   

  1. 1.桂林理工大学 信息科学与工程学院, 广西 桂林 541006
    2.广西嵌入式技术与智能系统重点实验室(桂林理工大学), 广西 桂林 541006
    3.桂林理工大学 机械与控制工程学院, 广西 桂林 541006
  • 通讯作者: 张瑞芳
  • 作者简介:程小辉(1961—),男,江西樟树人,教授,主要研究方向:嵌入式系统、物联网、人工智能
    黄云天(1999—),男,广西桂林人,硕士研究生,主要研究方向:嵌入式系统、目标检测;
  • 基金资助:
    广西创新驱动发展专项(桂科AA18118009)

Abstract:

In view of occlusion and lack of texture details of infrared targets in road scenes, which leads to false detection and missed detection, a lightweight infrared road scene detection YOLO (You Only Look Once) model based on Multi-Scale and weighted Coordinate attention (MSC-YOLO) was proposed. YOLOv7-tiny was taken as the baseline model. Firstly, a multi-scale pyramid module PSA (Pyramid Split Attention) was introduced in different intermediate feature layers of the MobileNetV3, and a lightweight backbone extraction network MSM-Net (Multi-Scale Mobile Network) for multi-scale feature extraction was designed to solve the problem of feature pollution caused by the fixed-size convolution kernel, improving the fine-grained extraction ability of targets of different scales. Secondly, Weighted Coordinate Attention (WCA) mechanism was integrated into the feature fusion network, and the target position information obtained from the vertical and horizontal spatial directions of the intermediate feature map was superimposed to enhance the fusion ability of target features in different dimensions. Finally, the positioning loss function was replaced to Efficient Intersection over Union (EIoU) to calculate the length and width influencing factors of the predicted frame and the real frame separately, accelerating the convergence. The verification experiment was carried out on the Flir dataset. Compared with the YOLOv7-tiny model, the number of parameters is reduced by 67.3%, the number of floating-point operations is reduced by 54.6%, and the model size is reduced by 60.5% under the premise that mAP(IoU=0.5) (mean Average Precision (IoU=0.5)) is only reduced by 0.7 percentage points. The Frames Per Second (FPS) reaches 101 on the RTA 2080Ti, achieving a balance between detection performance and lightweight, and meets the real-time detection requirements of infrared road scenes.

Key words: infrared road scene detection, multi-scale, Weighted Coordinate Attention (WCA), lightweight, positioning loss function

摘要:

针对道路场景下红外目标遮挡、缺乏纹理细节而导致目标误检、漏检的问题,提出一种基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型(MSC-YOLO)。以YOLOv7-tiny作为基线模型,首先,在MobileNetV3的不同中间特征层引入多尺度金字塔模块PSA(Pyramid Split Attention),设计一种多尺度特征提取的轻量化主干提取网络MSM-Net(Multi-Scale Mobile Network),解决固定大小卷积核造成的特征污染问题,提高对于不同尺度目标的细粒度提取能力;其次,在特征融合网络融入加权坐标注意力(WCA)机制,叠加从中间特征图垂直和水平空间方向上获取的目标位置信息,增强目标特征在不同维度上的融合能力;最后,替换定位损失函数为高效交并比(EIoU),分别计算预测框和真实框的长、宽影响因子,提高收敛速度。在Flir数据集上进行验证实验,与YOLOv7-tiny模型相比,在mAP(IoU=0.5)仅降低0.7个百分点的前提下,MSC-YOLO的参数量减少67.3%,浮点运算次数减少54.6%,模型大小减小60.5%,帧率在RTA 2080Ti上达到101,在检测性能和轻量化上达到平衡,满足红外道路场景的实时检测需求。

关键词: 红外道路场景检测, 多尺度, 加权坐标注意力, 轻量化, 定位损失函数

CLC Number: