Crowded pedestrian detection algorithm based on YOLOv5

doi:10.11772/j.issn.1001-9081.2024050733

Abstract

Abstract:

Aiming at the problems of low precision and high model complexity of crowded pedestrian detection algorithms， an improved crowded pedestrian detection algorithm YOLOv5_CDA was proposed based on YOLOv5. First， a C3CA module was designed in the backbone network， and the Coordinate Attention （CA） mechanism was introduced in the last layer to improve the network's ability of capturing local important features. Secondly， the α-IoU loss function was introduced to improve the model's focus on the high Intersection over Union （IoU） targets， thus improving regression accuracy of the bounding box. Thirdly， the detection scale in the neck network was changed to improve the algorithm's ability to detect dense small targets. Finally， the decoupled head was used to calculate the different branches respectively to improve the detection accuracy. Experimental results show that the YOLOv5_CDA algorithm has excellent test performance on the representative pedestrian detection dataset WiderPerson. It has the AP_0.5 and AP_0.5：0.95 of 90.3% and 63.7%， respectively， with improvements of 1.7% and 3.2% over the YOLOv5 algorithm， and the average missed detection rate decreased by 20%， and the number of parameters decreased by 25.3%. It can be seen that after the overall improvement of the network structure， the YOLOv5_CDA algorithm has the performance improved significantly， without consuming too much memory resources， and can be widely used in crowded pedestrian detection.

Key words: YOLOv5, crowded pedestrian detection, loss function, decoupled head, attention mechanism

摘要：

针对当前密集行人检测算法精度低且模型复杂度高的问题，在YOLOv5算法的基础上提出一种改进的密集行人检测算法YOLOv5_CDA。首先在主干网络中设计一种C3CA模块，并在最后一层引入坐标注意力（CA）机制，提高网络对局部重要特征的捕获能力；其次，引入α-IoU损失函数，提高模型对高交并比（IoU）目标的关注，提升边界框的回归精度；再次，在颈部网络中变换检测尺度，提高了算法对密集小目标的检测能力；最后，应用解耦检测头分别计算不同分支，提升检测精度。实验结果表明：YOLOv5_CDA算法在具有代表性的行人检测数据集WiderPerson上测试性能表现优秀，AP_0.5和AP_0.5：0.95分别达到了90.3%和63.7%，相较于YOLOv5算法分别提升了1.7%和3.2%，且平均漏检率下降了20%，参数量下降了25.3%。可见，经过网络结构的整体改进，YOLOv5_CDA算法的性能得到较大提升，且不会过多消耗内存资源，可广泛应用于密集行人检测。

关键词: YOLOv5, 密集行人检测, 损失函数, 解耦检测头, 注意力机制

CLC Number:

TP391

Jun ZOU, Jun LI, Shiyi ZHANG. Crowded pedestrian detection algorithm based on YOLOv5[J]. Journal of Computer Applications, 0, (): 246-250.

邹军, 李军, 张世义. 基于YOLOv5的密集行人检测算法[J]. 《计算机应用》唯一官方网站, 0, (): 246-250.

Figures/Tables 13

名称	环境配置
操作系统	Windows11
CPU	Intel Core 7-12700KF@3.60 GHz（16 GB RAM）
GPU	NVIDIA GeForce RTX 3050 （8 GB独显）
CUDA	CUDA 11.7
深度学习框架	PyTorch 2.0.0
Python	Python-3.9.17

检测模型	AP_0.5/%	AP_0.5：0.95/%	MR/%	Params/10⁶	模型体积/MB
YOLOv5n	87.2	58.8	23.8	1.87	8.84
YOLOv5_CDAn	88.5	60.5	19.7	1.54	3.90
YOLOv5s	88.8	61.7	22.0	7.02	13.70
YOLOv5_CDAs	90.3	63.7	17.6	5.23	10.80
YOLOv5m	89.3	63.5	22.1	20.87	40.20
YOLOv5_CDAm	91.0	64.9	15.8	16.10	33.30

$α$	P/%	R/%	F/%	Params/10⁶
1.0	88.1	78.0	82.7	7.02
1.5	87.6	81.2	84.3	7.02
2.0	87.7	81.6	84.5	7.02
2.5	87.3	81.1	84.1	7.02
3.0	86.8	81.9	84.3	7.02

$α$	P/%	R/%	F/%	Params/10⁶
1.0	88.1	78.0	82.7	7.02
1.5	87.6	81.2	84.3	7.02
2.0	87.7	81.6	84.5	7.02
2.5	87.3	81.1	84.1	7.02
3.0	86.8	81.9	84.3	7.02

主干网络改进	P/%	R/%	F/%	Params/10⁶
方案1	88.8	77.1	82.5	7.8
方案2	88.9	77.3	82.7	6.7
方案3	88.3	78.0	82.8	5.9

序号	YOLOv5s （baseline）	C3CA模块	Decoupled Head	Neck改进	α-IoU	AP_0.5/%	AP_0.5：0.95/%	MR/%	Params/10⁶	模型体积/MB
1	√					88.8	61.7	22.0	7.00	13.7
2	√	√				88.9	61.8	22.0	5.89	12.2
3	√		√			88.8	62.2	22.4	8.77	17.0
4	√			√		89.6	61.9	19.6	5.38	14.4
5	√				√	89.5	63.5	18.1	7.00	14.4
6	√			√	√	90.0	63.1	18.6	5.30	11.0
7	√	√		√		89.6	61.6	19.5	4.24	9.3
8	√	√		√	√	90.1	62.9	17.8	4.24	9.3
9	√		√	√	√	90.2	63.8	17.9	6.38	13.6
10	√	√	√	√	√	90.3	63.7	17.6	5.23	10.8

检测模型	AP_0.5/%	AP_0.5：0.95/%	MR/%	Params/10⁶
YOLOv3s	80.5	50.3	29.4	8.68
YOLOv3-tiny	84.7	58.3	25.5	12.12
YOLOv7s	87.9	57.3	20.0	9.14
YOLOv7- Tiny	87.1	56.1	19.9	6.02
YOLOv8s	90.3	64.9	17.9	11.13
YOLOv5s	88.8	61.7	22.0	7.02
YOLOv9	91.5	67.0	17.3	60.10
YOLOv5_CDAs	90.3	63.7	17.6	5.23

References 23

1	胡宏宇，刁小桔，高菲，等. 自动驾驶汽车-行人交互研究综述［J］. 汽车技术， 2021（9）： 1-9.
2	左志强，刘正璇，王一晶. 基于车路云一体化的混合交通系统优化控制综述［J］. 控制与决策， 2023， 38（3）： 577-594.
3	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
4	CAI Z， VASCONCELOS N. Cascade R-CNN： delving into high quality object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6154-6162.
5	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988.
6	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
7	REDMON J， FARHADI A. YOLO9000： better， faster， stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525.
8	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2024-01-16］..
9	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2024-04-23］..
10	LI C， LI L， JIANG H， et al. YOLOv6： a single-stage object detection framework for industrial applications［EB/OL］. ［2023-09-07］..
11	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023：7464-7475.
12	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
13	XU H， GUO M， NEDJAH N， et al. Vehicle and pedestrian detection algorithm based on lightweight YOLOv3-promote and semi-precision acceleration［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（10）： 19760-19771.
14	LI K， ZHUANG Y， LAI J， et al. PFYOLOv4： an improved small object pedestrian detection algorithm［J］. IEEE Access， 2023， 11： 17197-17206.
15	XUE P， CHEN H， LI Y， et al. Multi-scale pedestrian detection with global-local attention and multi-scale receptive field context［J］. IET Computer Vision， 2023， 17（1）： 13-25.
16	SHA M， ZENG K， TAO Z， et al. Lightweight pedestrian detection based on feature multiplexed residual network［J］. Electronics， 2023， 12（4）：No. 918.
17	LI M L， SUN G B， YU J X. A pedestrian detection network model based on improved YOLOv5［J］. Entropy， 2023， 25（2）： No.381.
18	WANG Q， WU B， ZHU P， et al. ECA-Net： efficient channel attention for deep convolutional neural networks［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539.
19	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
20	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
21	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13708-13717.
22	TANG H， LIANG S， YAO D， et al. A visual defect detection for optics lens based on the YOLOv5-C3CA-SPPF network model［J］. Optics Express， 2023， 31（2）： 2628-2643.
23	ZHANG S， XIE Y， WAN J， et al. WiderPerson： a diverse dataset for dense pedestrian detection in the wild［J］. IEEE Transactions on Multimedia， 2020， 22（2）： 380‑393.

[1]	Pingping YU, Yuting YAN, Xinliang TANG, He SU, Jianchao WANG. Multi-object tracking algorithm for construction machinery in transmission line scenarios [J]. Journal of Computer Applications, 2025, 45(7): 2351-2360.
[2]	Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG. Pedestrian detection algorithm based on multi-view information [J]. Journal of Computer Applications, 2025, 45(7): 2325-2332.
[3]	Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252.
[4]	Huibin WANG, Zhan’ao HU, Jie HU, Yuanwei XU, Bo WEN. Time series forecasting model based on segmented attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2262-2268.
[5]	Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.
[6]	Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244.
[7]	Wenjing YAN, Ruidong WANG, Min ZUO, Qingchuan ZHANG. Recipe recommendation model based on hierarchical learning of flavor embedding heterogeneous graph [J]. Journal of Computer Applications, 2025, 45(6): 1869-1878.
[8]	Haijie WANG, Guangxin ZHANG, Hai SHI, Shu CHEN. Document-level relation extraction based on entity representation enhancement [J]. Journal of Computer Applications, 2025, 45(6): 1809-1816.
[9]	Weigang LI, Xinyi LI, Yongqiang WANG, Yuntao ZHAO. Point cloud classification and segmentation method based on adaptive dynamic graph convolution and parameter-free attention [J]. Journal of Computer Applications, 2025, 45(6): 1980-1986.
[10]	Yuan SONG, Xin CHEN, Yarong LI, Yongwei LI, Yang LIU, Zhen ZHAO. Single-channel speech separation model based on auditory modulation Siamese network [J]. Journal of Computer Applications, 2025, 45(6): 2025-2033.
[11]	Dehui ZHOU, Jun ZHAO, Jinfeng CHENG. Tiny defect detection algorithm for bearing surface based on RT-DETR [J]. Journal of Computer Applications, 2025, 45(6): 1987-1997.
[12]	Sheping ZHAI, Yan HUANG, Qing YANG, Rui YANG. Multi-view entity alignment combining triples and text attributes [J]. Journal of Computer Applications, 2025, 45(6): 1793-1800.
[13]	Xiang WANG, Qianqian CUI, Xiaoming ZHANG, Jianchao WANG, Zhenzhou WANG, Jialin SONG. Wireless capsule endoscopy image classification model based on improved ConvNeXt [J]. Journal of Computer Applications, 2025, 45(6): 2016-2024.
[14]	Man CHEN, Xiaojun YANG, Huimin YANG. Pedestrian trajectory prediction based on graph convolutional network and endpoint induction [J]. Journal of Computer Applications, 2025, 45(5): 1480-1487.
[15]	Jie HU, Cui WU, Jun SUN, Yan ZHANG. Document-level relation extraction model based on anaphora and logical reasoning [J]. Journal of Computer Applications, 2025, 45(5): 1496-1503.