《计算机应用》唯一官方网站

• •    下一篇

基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型

程小辉1,黄云天2,张瑞芳2   

  1. 1. 桂林理工大学 信息科学与工程学院,广西 桂林 541004
    2. 桂林理工大学
  • 收稿日期:2023-06-19 修回日期:2023-09-12 发布日期:2023-09-27 出版日期:2023-09-27
  • 通讯作者: 黄云天
  • 基金资助:
    国家自然科学基金资助项目;广西创新驱动发展专项资金项目;广西科技计划重点研发项目

Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

  • Received:2023-06-19 Revised:2023-09-12 Online:2023-09-27 Published:2023-09-27

摘要: 摘 要: 针对道路场景下红外目标遮挡、缺乏纹理细节而导致目标误检、漏检的问题,提出一种基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型(MSC-YOLO)。以YOLOv7-tiny作为基线模型,首先,在MobileNetv3网络的不同中间特征层引入多尺度金字塔模块(PSA),设计一种多尺度特征提取的轻量化主干提取网络(MSM-Net),解决固定大小卷积核造成的特征污染问题,提高对于不同尺度目标的细粒度提取能力;其次,在特征融合网络融入加权坐标注意力机制(WCA),叠加从中间特征图垂直和水平空间方向上获取到的目标位置信息,增强目标特征在不同维度上的融合能力;最后,替换定位损失函数为高效交并比(EIOU),分别计算预测框和真实框的长、宽影响因子,加速收敛速度。在Flir数据集上进行验证实验,与YOLOv7-tiny模型相比,在mAP(IOU=0.5)仅降低0.7个百分比的前提下,参数量减少67.3%,浮点运算次数减少54.6%,模型大小减少60.5%,FPS在RTA 2080Ti上达到101,在检测性能和轻量化上达到平衡,满足红外道路场景的实时检测需求。

关键词: 关键词: 红外道路场景检测, 多尺度, 加权坐标注意力, 轻量化, 定位损失函数

Abstract: Abstract: In view of occlusion and lack of texture details of infrared targets in road scenes, which lead to false detection and missed detection, a lightweight infrared road scene detection model based on multi-scale and weighted coordinate attention, named MSC-YOLO, was proposed. Taking YOLOv7-tiny (You Only Look Once) as the baseline model, firstly, the Pyramid Split Attention (PSA) module was introduced in different intermediate feature layers of the MobileNetv3 network, and a lightweight backbone extraction network (Multi-scale Mobile Network, MSM-Net) for multi-scale feature extraction was designed to solve the problem of feature pollution caused by the fixed-size convolution kernel, improving the fine-grained extraction ability of targets of different scales. Secondly, the Weighted Coordinate Attention (WCA) was integrated into the feature fusion network, and the target position information obtained from the vertical and horizontal spatial directions of the intermediate feature map was superimposed to enhance the fusion ability of target features in different dimensions. Finally, the positioning loss function Efficient Intersection over Union (EIOU) was replaced to calculate the length and width influencing factors of the predicted frame and the real frame separately, accelerating the convergence speed. The verification experiment was carried out on the Flir dataset. Compared with the YOLOv7-tiny model, the number of parameters is reduced by 67.3%, the number of floating-point operations is reduced by 54.6%, and the model size is reduced by 60.5% under the premise that mean Average Precision(IOU=0.5) (mAP(IOU=0.5)) is only reduced by 0.7 percentage. The Frames Per Second (FPS) reaches 101 on the RTA 2080Ti, which achieves a balance between detection performance and light weight, and meets the real-time detection requirements of infrared road scenes.

Key words: Keywords: infrared road scene detection, multi scale, weighted coordinate attention(WCA), lightweight, positioning loss function

中图分类号: