Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

doi:10.11772/j.issn.1001-9081.2023060775

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1927-1934.DOI: 10.11772/j.issn.1001-9081.2023060775

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

Xiaohui CHENG¹^,², Yuntian HUANG¹, Ruifang ZHANG³()

^1.College of Computer Science and Engineering，Guilin University of Technology，Guilin Guangxi 541006，China
^2.Guangxi Key Laboratory of Embedded Technology and Intelligent System （Guilin University of Technology），Guilin Guangxi 541006，China
^3.College of Mechanical and Control Engineering，Guilin University of Technology，Guilin Guangxi 541006，China

Received:2023-06-20 Revised:2023-09-11 Accepted:2023-09-12 Online:2023-09-27 Published:2024-06-10
Contact: Ruifang ZHANG
About author:CHENG Xiaohui， born in 1961， professor. His research interests include embedded systems， IoT， artificial intelligence.
HUANG Yuntian， born in 1999， M. S. candidate. His research interests include embedded systems， object detection.
Supported by:
Guangxi Innovation Driven Development Special Fund(Guike AA18118009)

基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型

程小辉¹^,², 黄云天¹, 张瑞芳³()

^1.桂林理工大学信息科学与工程学院, 广西桂林 541006
^2.广西嵌入式技术与智能系统重点实验室(桂林理工大学), 广西桂林 541006
^3.桂林理工大学机械与控制工程学院, 广西桂林 541006

通讯作者: 张瑞芳
作者简介:程小辉（1961—），男，江西樟树人，教授，主要研究方向：嵌入式系统、物联网、人工智能
黄云天（1999—），男，广西桂林人，硕士研究生，主要研究方向：嵌入式系统、目标检测；
基金资助:
广西创新驱动发展专项(桂科AA18118009)

Abstract

Abstract:

In view of occlusion and lack of texture details of infrared targets in road scenes， which leads to false detection and missed detection， a lightweight infrared road scene detection YOLO （You Only Look Once） model based on Multi-Scale and weighted Coordinate attention （MSC-YOLO） was proposed. YOLOv7-tiny was taken as the baseline model. Firstly， a multi-scale pyramid module PSA （Pyramid Split Attention） was introduced in different intermediate feature layers of the MobileNetV3， and a lightweight backbone extraction network MSM-Net （Multi-Scale Mobile Network） for multi-scale feature extraction was designed to solve the problem of feature pollution caused by the fixed-size convolution kernel， improving the fine-grained extraction ability of targets of different scales. Secondly， Weighted Coordinate Attention （WCA） mechanism was integrated into the feature fusion network， and the target position information obtained from the vertical and horizontal spatial directions of the intermediate feature map was superimposed to enhance the fusion ability of target features in different dimensions. Finally， the positioning loss function was replaced to Efficient Intersection over Union （EIoU） to calculate the length and width influencing factors of the predicted frame and the real frame separately， accelerating the convergence. The verification experiment was carried out on the Flir dataset. Compared with the YOLOv7-tiny model， the number of parameters is reduced by 67.3%， the number of floating-point operations is reduced by 54.6%， and the model size is reduced by 60.5% under the premise that mAP（IoU=0.5）（mean Average Precision （IoU=0.5）） is only reduced by 0.7 percentage points. The Frames Per Second （FPS） reaches 101 on the RTA 2080Ti， achieving a balance between detection performance and lightweight， and meets the real-time detection requirements of infrared road scenes.

Key words: infrared road scene detection, multi-scale, Weighted Coordinate Attention (WCA), lightweight, positioning loss function

摘要：

针对道路场景下红外目标遮挡、缺乏纹理细节而导致目标误检、漏检的问题，提出一种基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型（MSC-YOLO）。以YOLOv7-tiny作为基线模型，首先，在MobileNetV3的不同中间特征层引入多尺度金字塔模块PSA（Pyramid Split Attention），设计一种多尺度特征提取的轻量化主干提取网络MSM-Net（Multi-Scale Mobile Network），解决固定大小卷积核造成的特征污染问题，提高对于不同尺度目标的细粒度提取能力；其次，在特征融合网络融入加权坐标注意力（WCA）机制，叠加从中间特征图垂直和水平空间方向上获取的目标位置信息，增强目标特征在不同维度上的融合能力；最后，替换定位损失函数为高效交并比（EIoU），分别计算预测框和真实框的长、宽影响因子，提高收敛速度。在Flir数据集上进行验证实验，与YOLOv7-tiny模型相比，在mAP（IoU=0.5）仅降低0.7个百分点的前提下，MSC-YOLO的参数量减少67.3%，浮点运算次数减少54.6%，模型大小减小60.5%，帧率在RTA 2080Ti上达到101，在检测性能和轻量化上达到平衡，满足红外道路场景的实时检测需求。

关键词: 红外道路场景检测, 多尺度, 加权坐标注意力, 轻量化, 定位损失函数

CLC Number:

TP391.41

Xiaohui CHENG, Yuntian HUANG, Ruifang ZHANG. Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention[J]. Journal of Computer Applications, 2024, 44(6): 1927-1934.

程小辉, 黄云天, 张瑞芳. 基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1927-1934.

Figures/Tables 13

References 24

1	李强龙，周新文，位梦恩，等.基于条形池化和注意力机制的街道场景红外目标检测算法［J］.计算机工程， 2023， 49（8）：310-320.
	LI Q L， ZHOU X W， WEI M E， et al. Infrared target detection algorithm based on strip pooling and attention mechanism in street scene［J］. Computer Engineering， 2023，49（8）：310-320.
2	DAI X， YUAN X， WEI X. TIRNet： object detection in thermal infrared images for autonomous driving ［J］. Applied Intelligence， 2020， 51（3）： 1244-1261.
3	ZHANG H， LUO C， WANG Q， et al. A novel infrared video surveillance system using deep learning based techniques ［J］. Multimedia Tools and Applications， 2018， 77： 26657-26676.
4	MURESAN M P， BREHAR R D， NEDEVSCHI S. Vision algorithms and embedded solution for pedestrian detection with far infrared camera ［C］// Proceedings of the 2014 IEEE 10th International Conference on Intelligent Computer Communication and Processing. Piscataway： IEEE， 2014： 133-136.
5	VIOLA P， JONES M. Rapid object detection using a boosted cascade of simple features ［C］// Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2001：511-518.
6	AJAY A， DIXON K D M， SOWMYA V， et al. Aerial image classification using GURLS and LIBSVM ［C］// Proceedings of the 2016 International Conference on Communication and Signal Processing. Piscataway： IEEE， 2016： 396-401.
7	HUANG D， WANG Y-H， WANG Y-D. A robust infrared face recognition method based on AdaBoost Gabor features ［C］//Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition. Piscataway： IEEE， 2007：1114-1118.
8	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge：MIT Press， 2015： 91-99.
9	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］.［2023-06-03］. .
10	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 21-37.
11	LI S， LI Y， LI Y， et al. YOLO-FIRI： improved YOLOv5 for infrared image object detection ［J］. IEEE Access， 2021， 9： 141861-141875.
12	赵明，张浩然.一种基于跨域融合网络的红外目标检测方法［J］.光子学报，2021，50（11）：111001.
	ZHAO M， ZHANG H R. An infrared object detection method based on cross-domain fusion network［J］. Acta Photonica Sinica， 2021， 50（11）：111001.
13	黄磊，杨媛，杨成煜，等.FS-YOLOv5：轻量化红外目标检测方法［J］.计算机工程与应用，2023，59（9）：215-224.
	HUANG L， YANG Y， YANG C Y， et al. FS-YOLOv5： lightweight infrared rode target detection method［J］. Computer Engineering and Applications， 2023，59（9）：215-224.
14	秦鹏，唐川明，刘云峰，等.基于改进YOLOv3的红外目标检测方法［J］.计算机工程，2022，48（3）：211-219.
	QIN P， TANG C M， LIU Y F， et al. Infrared target detection method based on improved YOLOv3 ［J］.Computer Engineering， 2022， 48（3）：211-219.
15	谌海云，余鸿皓，王海川，等.基于改进YOLOX的红外目标检测算法［J］.电子测量技术，2022，45（23）：72-81.
	SHEN H Y， YU H H， WANG H C， et al. Object detection algorithm of thermal infrared images based on improved YOLOX［J］. Electronic Measurement Technology， 2022， 45（23）：72-81.
16	HOWARD A， SANDLER M， CHU G， et al. Searching for MobileNetV3 ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324.
17	ZHAO X， ZHANG L， PANG Y， et al. A single stream network for robust and real-time RGB-D salient object detection ［C］// Proceedings of the 16th European Conference on Computer Vision. Cham： Springer， 2020： 646-662.
18	FENG M， LU H， DING E. Attentive feedback network for boundary-aware salient object detection ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019：1623-1632.
19	ZHANG H， ZU K， LU J， et al. EPSANet： an efficient pyramid squeeze attention block on convolutional neural network ［C］// Proceedings of the 16th Asian Conference on Computer Vision. Cham： Springer，2021： 541-557.
20	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
21	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway：IEEE， 2018： 7132-7141.
22	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021：13708-13717.
23	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787.
24	ZHANG Y F， REN W， ZHANG Z， et al. Focal and efficient IOU loss for accurate bounding box regression ［J］. Neurocomputing， 2021， 506： 146-157.

名称	环境配置
操作系统	Ubuntu16.04
处理器	Intel Xeon Silver 4110 CPU @ 2.10 GHz
显卡	NVIDIA RTA 2080Ti （4块）
深度学习框架	PyTorch1.8.1，CUDA10.1，CUDNN7.6.4

名称	环境配置
操作系统	Ubuntu16.04
处理器	Intel Xeon Silver 4110 CPU @ 2.10 GHz
显卡	NVIDIA RTA 2080Ti （4块）
深度学习框架	PyTorch1.8.1，CUDA10.1，CUDNN7.6.4

情况	参数量/ 10⁶	浮点运算量/10⁶	mAP（IoU=0.5）/%	F1/%
1	2.15	5.1	76.5	73.6
2	2.21	5.6	77.0	74.6
3	2.19	5.5	76.6	73.7
4	2.29	5.9	77.7	75.2

情况	参数量/ 10⁶	浮点运算量/10⁶	mAP（IoU=0.5）/%	F1/%
1	2.15	5.1	76.5	73.6
2	2.21	5.6	77.0	74.6
3	2.19	5.5	76.6	73.7
4	2.29	5.9	77.7	75.2

*Input*	op	exp	out	k	se	s
640²×1	MB	—	16	3	—	2
320²×16	MB	16	16	3	—	1
320²×16	MB	64	24	3	—	2
160²×24	MB	72	24	3	—	1
160²×24	MB	72	40	5	√	2
80²×40	MBx	128	40	PSA	—	—
80²×40	MB	120	40	5	√	1
80²×40	MB	240	80	3	—	2
40²×80	MB	200	80	3	—	1
40²×80	MB	184	80	3	—	1
40²×80	MB	184	80	3	—	1
40²×80	MBx	256	112	PSA	—	—
40²×112	MB	672	112	3	√	1
40²×112	MB	672	160	3	√	2
20²×160	MBx	192	256	PSA	—	—

Lightweight infrared road scene detection model based on multiscale and weighted coordinate attention

基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 24

Related Articles 15

Recommended Articles

Metrics

alpha	mAP（IoU=0.5）/%	P/%	R/%	F1/%
0.3	78.2	79.6	70.5	74.7
0.4	77.0	79.1	70.7	74.6
0.5	78.1	77.3	73.1	75.1
0.6	77.4	77.9	71.2	74.3
0.7	76.8	81.0	69.6	74.8

模型	M3	MSM	WCA	EIoU	参数量/10⁶	浮点运算量/10⁶	P/%	R/%	mAP（IoU=0.5）/%	F1/%
YOLOv7-tiny	×	×	×	×	6.01	13.0	81.7	71.8	78.9	76.4
A	√	×	×	×	3.68	6.3	80.8	67.2	76.3	73.3
B	×	√	×	×	2.25	5.9	80.3	70.8	77.7	75.2
C	×	√	√	×	2.29	5.9	77.3	73.1	78.1	75.1
D	×	√	√	√	2.29	5.9	80.2	71.0	78.2	75.3

模型	参数量/10⁶	浮点运算量/10⁶	size/MB	FPS	AP/%			mAP（IoU=0.5）/%	F1/%
模型	参数量/10⁶	浮点运算量/10⁶	size/MB	FPS	Car	Bicycle	Person	mAP（IoU=0.5）/%	F1/%
YOLOv3-tiny	8.67	12.9	16.61	507	86.3	52.5	75.1	71.3	70.8
YOLOv5s	7.01	15.8	13.76	155	90.3	62.6	83.0	78.6	76.2
ShuffleNet-YOLOv7-tiny	4.49	8.5	8.91	107	87.8	49.0	77.5	71.5	71.2
EfficientNet-YOLOv7-tiny	3.87	7.9	7.74	127	89.6	57.3	82.9	76.6	74.0
-YOLOv7-tiny	3.68	6.3	7.36	103	89.2	57.4	82.4	76.3	73.3
YOLOv7-tiny	6.01	13.0	11.72	156	91.3	61.5	83.8	78.9	76.4
YOLOv8n	3.01	8.1	5.96	169	89.3	56.8	81.3	75.8	72.8
FS-YOLOv5s^［13］	5.20	—	10.70	—	89.1	59.2	81.5	76.6	—
Strip-YOLOs ^［1］	8.10	19.3	—	—	90.5	67.1	84.8	80.7	—
MSC-YOLO	2.29	5.9	4.63	101	89.2	62.3	83.1	78.2	75.3

[1]	Yanjun LI, Yaodong GE, Qi WANG, Weiguo ZHANG, Chen LIU. Improved KLEIN algorithm and its quantum analysis [J]. Journal of Computer Applications, 2024, 44(9): 2810-2817.
[2]	Yan RONG, Jiawen LIU, Xinlei LI. Adaptive hybrid network for affective computing in student classroom [J]. Journal of Computer Applications, 2024, 44(9): 2919-2930.
[3]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[4]	Chenqian LI, Jun LIU. Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention [J]. Journal of Computer Applications, 2024, 44(8): 2604-2610.
[5]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[6]	Wei LI, Xiaorong ZHANG, Peng CHEN, Qing LI, Changqing ZHANG. Crowd counting algorithm with multi-scale fusion based on normal inverse Gamma distribution [J]. Journal of Computer Applications, 2024, 44(7): 2243-2249.
[7]	Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017.
[8]	Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.
[9]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[10]	Yongjin ZHANG, Jian XU, Mingxing ZHANG. Lightweight algorithm for impurity detection in raw cotton based on improved YOLOv7 [J]. Journal of Computer Applications, 2024, 44(7): 2271-2278.
[11]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.
[12]	Xiaogang SONG, Dongdong ZHANG, Pengfei ZHANG, Li LIANG, Xinhong HEI. Real-time object detection algorithm for complex construction environments [J]. Journal of Computer Applications, 2024, 44(5): 1605-1612.
[13]	Jun FENG, Jiankang BI, Yiru HUO, Jiakuan LI. PIPNet： lightweight asphalt pavement crack image segmentation network [J]. Journal of Computer Applications, 2024, 44(5): 1520-1526.
[14]	Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444.
[15]	Huantong GENG, Zhenyu LIU, Jun JIANG, Zichen FAN, Jiaxing LI. Embedded road crack detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2024, 44(5): 1613-1618.