基于改进YOLOv3的多尺度目标检测算法

doi:10.11772/j.issn.1001-9081.2021060984

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2423-2431.DOI: 10.11772/j.issn.1001-9081.2021060984

所属专题：人工智能

基于改进YOLOv3的多尺度目标检测算法

张丽莹¹, 庞春江¹, 王新颖¹(), 李国亮²

^1.华北电力大学（保定）计算机系，河北保定 071003
^2.国网山东省电力公司枣庄供电公司，山东枣庄 277800

收稿日期:2021-06-10 修回日期:2021-09-28 接受日期:2021-10-12 发布日期:2021-12-27 出版日期:2022-08-10
通讯作者: 王新颖
作者简介:张丽莹（1996—），女，河北保定人，硕士研究生，主要研究方向：图像处理、深度学习；
庞春江（1965—），男，河北保定人，副教授，硕士，主要研究方向：图形图像处理、深度学习；
王新颖（1984—），男，河北保定人，讲师，硕士，主要研究方向：人工智能；
李国亮（1981—），男，河北安新人，讲师，硕士，主要研究方向：计算机视觉、知识图谱。
基金资助:
中央高校基本科研业务费专项资金资助项目(2021MS090);国网山东省电力公司枣庄供电公司科技项目(SD20-GC-ZB003-SGZH-KJ)

Multi-scale object detection algorithm based on improved YOLOv3

Liying ZHANG¹, Chunjiang PANG¹, Xinying WANG¹(), Guoliang LI²

^1.Department of Computer，North China Electric Power University （Baoding），Baoding Hebei 071003，China
^2.Zaozhuang Power Supply Company，State Grid Shandong Electric Power Company，Zaozhuang Shandong 277800，China

Received:2021-06-10 Revised:2021-09-28 Accepted:2021-10-12 Online:2021-12-27 Published:2022-08-10
Contact: Xinying WANG
About author:ZHANG Liying， born in 1996， M. S. candidate. Her research interests include image processing， deep learning.
PANG Chunjiang， born in 1965， M. S.， associate professor. His research interests include graphics and image processing， deep learning.
WANG Xinying， born in 1984， M. S.， lecturer. His research interests include artificial intelligence.
LI Guoliang， born in 1981， M. S.， lecturer. His research interests include computer vision， knowledge graph.
Supported by:
Fundamental Research Funds for Central University(2021MS090);Science and Technology Project of Zaozhuang Power Supply Company of State Grid Shandong Electric Power Company(SD20-GC-ZB003-SGZH-KJ)

摘要/Abstract

摘要：

为了进一步提高多尺度目标检测的速度和精度，解决小目标检测易造成的漏检、错检以及重复检测等问题，提出一种基于改进YOLOv3的目标检测算法实现多尺度目标的自动检测。首先，在特征提取网络中对网络结构进行改进，在残差模块的空间维度中引入注意力机制，对小目标进行关注；然后，利用密集连接网络（DenseNet）充分融合网络浅层信息，并用深度可分离卷积替换主干网络中的普通卷积，减少模型的参数量，提升检测速率。在特征融合网络中，通过双向金字塔结构实现深浅层特征的双向融合，并将3尺度预测变为4尺度预测，提高了多尺度特征的学习能力；在损失函数方面，选取GIoU（Generalized Intersection over Union）作为损失函数，提高目标识别的精度，降低目标漏检率。实验结果表明，基于改进YOLOv3（You Only Look Once v3）的目标检测算法在Pascal VOC测试集上的平均准确率均值（mAP）达到83.26%，与原YOLOv3算法相比提升了5.89个百分点，检测速度达22.0 frame/s；在COCO数据集上，与原YOLOv3算法相比，基于改进YOLOv3的目标检测算法在mAP上提升了3.28个百分点；同时，在进行多尺度的目标检测中，算法的mAP有所提升，验证了基于改进YOLOv3的目标检测算法的有效性。

关键词: 目标检测, YOLOv3, 多尺度目标, 双向特征金字塔, 注意力机制

Abstract:

In order to further improve the speed and precision of multi-scale object detection， and to solve the situations such as miss detection， wrong detection and repeated detection caused by small object detection， an object detection algorithm based on improved You Only Look Once v3 （YOLOv3） was proposed to realize automatic detection of multi-scale object. Firstly， the network structure was improved in the feature extraction network， and the attention mechanism was introduced into the spatial dimensions of residual module to pay attention to small objects. Then， Dense Convulutional Network （DenseNet） was used to fully integrate shallow information of the network， and the depthwise separable convolution was used to replace the normal convolution of the backbone network， thereby reducing the number of model parameters and improving the detection speed. In the feature fusion network， the bidirectional fusion of the shallow and deep features was realized through the bidirectional feature pyramid structure， and the 3-scale prediction was changed to 4-scale prediction， which improved the learning ability of multi-scale features. In terms of loss function， Generalized Intersection over Union （GIoU） was selected as the loss function， so that the precision of identifying objects was increased， and the object miss rate was reduced. Experimental results show that on Pascal VOC datasets， the mean Average Precision （mAP） of the improved YOLOv3 algorithm is as high as 83.26%， which is 5.89 percentage points higher than that of the original YOLOv3 algorithm， and the detection speed of the improved algorithm reaches 22.0 frame/s. Compared with the original YOLOv3 algorithm on Common Objects in COntext （COCO） dataset， the improved algorithm has the mAP improved by 3.28 percentage points. At the same time， in multi-scale object detection， the mAP of the algorithm has been improved， which verifies the effectiveness of the object detection algorithm based on the improved YOLOv3.

Key words: object detection, YOLOv3 (You Only Look Once v3), multi-scale object, bidirectional feature pyramid, attention mechanism

中图分类号:

TP391.41

张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 计算机应用, 2022, 42(8): 2423-2431.

Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.

图/表 18

参考文献 24

1	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
2	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
3	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 91-99.
4	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
5	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
6	REDMON R， FARHIDI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-03-20］. .
7	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-4-23）［2021-03-20］. .
8	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
9	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector［EB/OL］. （2017-01-23）［2021-03-05］. .
10	LIU S， HUANG D， WANG Y. Receptive field block net for accurate and fast object detection ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11215. Cham： Springer， 2018： 404-419.
11	ZHOU X Y， WANG D Q， KRÄHENBÜHL P. Object as points［EB/OL］. （2019-04-25）［2021-05-06］. .
12	刘晓楠，王正平，贺云涛，等.基于深度学习的小目标检测研究综述［J］.战术导弹技术， 2019（1）： 100-107.
	LIU X N， WANG Z P， HE Y T， et al. Research on small target detection based on deep learning［J］. Tactical Missile Technology， 2019（1）： 100-107.
13	马巧梅，王明俊，梁昊然.复杂场景下基于改进YOLOv3的车牌定位检测算法［J］.计算机工程与应用， 2021， 57（7）： 198-208.
	MA Q M， WANG M J， LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes［J］. Computer Engineering and Applications， 2021， 57（7）： 198-208.
14	刘丹，吴亚娟，罗南超，等.嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测［J］.计算机应用， 2020， 40（8）： 2225-2230. 10.11772/j.issn.1001-9081.2020010030
	LIU D， WU Y J， LUO N C， et al. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules［J］. Journal of Computer Applications， 2020， 40（8）： 2225-2230. 10.11772/j.issn.1001-9081.2020010030
15	许腾，唐贵进，刘清萍，等.基于空洞卷积和Focal Loss的改进YOLOv3算法［J］.南京邮电大学学报（自然科学版）， 2020， 40（6）： 100-108. 10.14132/j.cnki.1673-5439.2020.06.015
	XU T， TANG G J， LIU Q P， et al. Improved YOLOv3 based on dilated convolution and Focal Loss［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2020， 40（6）： 100-108. 10.14132/j.cnki.1673-5439.2020.06.015
16	TIAN D X， LIN C M， ZHOU J S， et al. SA-YOLOv3： an efficient and accurate object detector using self-attention mechanism for autonomous driving［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（5）： 4099-4110. 10.1109/tits.2020.3041278
17	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
18	REZATOFIGHI H， TSOI N， GWAK J， et al. Generalized intersection over union： a metric and a loss for bounding box regression ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 658-666. 10.1109/cvpr.2019.00075
19	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
20	HUANG G， LIU Z， L VAN DER MAATEN， et al. Densely connected convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2261-2269. 10.1109/cvpr.2017.243
21	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
22	EVERINGHAM M， VAN GOOL L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge［J］. International Journal of Computer Vision， 2010， 88（2）： 303-338. 10.1007/s11263-009-0275-4
23	宦海，陈逸飞，张琳，等.一种改进的BR-YOLOv3目标检测网络［J］.计算机工程， 2021， 47（10）： 186-193. 10.19678/j.issn.1000-3428.0059234
	HUAN H， CHEN Y F， ZHANG L， et al. An improved BR-YOLOv3 object detection network［J］. Computer Engineering， 2021， 47（10）： 186-193. 10.19678/j.issn.1000-3428.0059234
24	刘紫燕，袁磊，朱明成，等.融合SPP和改进FPN的YOLOv3交通标志检测［J］.计算机工程与应用， 2021， 57（7）： 164-170.
	LIU Z Y， YUAN L， ZHU M C， et al. YOLOv3 traffic sign detection based on SPP and improved FPN［J］. Computer Engineering and Applications， 2021， 57（7）： 164-170.

类型	过滤器	尺寸	输出
Convolutional	32	3×3	416×416
Convolutional	64	3×3	208×208
Convolutional	32	1×1	208×208
Convolutional	64	3×3	208×208
Densenet unit
Convolutional	64	1×1
Average Pooling			104×104
Convolutional	64	1×1	104×104
Convolutional	128	3×3	104×104
Densenet unit
Convolutional	128	1×1
Convolutional	256	3×3/2	52×52
Convolutional	128	1×1
DW Conv	256	3×3
SE Block
Convolutional	128	1×1
Residual			52×52
Convolutional	256	1×1
Convolutional	512	3×3/2	26×26
Convolutional	256	1×1
DW Conv	512	3×3
SE Block
Convolutional	256	1×1
Residual			26×26
Convolutional	512	1×1
Convolutional	1 024	3×3/2	13×13
Convolutional	512	1×1
DW Conv	1 024	3×3
SE Block
Convolutional	512	1×1
Residual			13×13

类型	过滤器	尺寸	输出
Convolutional	32	3×3	416×416
Convolutional	64	3×3	208×208
Convolutional	32	1×1	208×208
Convolutional	64	3×3	208×208
Densenet unit
Convolutional	64	1×1
Average Pooling			104×104
Convolutional	64	1×1	104×104
Convolutional	128	3×3	104×104
Densenet unit
Convolutional	128	1×1
Convolutional	256	3×3/2	52×52
Convolutional	128	1×1
DW Conv	256	3×3
SE Block
Convolutional	128	1×1
Residual			52×52
Convolutional	256	1×1
Convolutional	512	3×3/2	26×26
Convolutional	256	1×1
DW Conv	512	3×3
SE Block
Convolutional	256	1×1
Residual			26×26
Convolutional	512	1×1
Convolutional	1 024	3×3/2	13×13
Convolutional	512	1×1
DW Conv	1 024	3×3
SE Block
Convolutional	512	1×1
Residual			13×13

配置项	型号
编程语言	Python
深度学习框架	PyTorch
操作系统	Windows 10
CPU	Inter Core i5-8500
运行内存	16 GB
GPU	NVIDIA GeForce GTX 2070
CUDA	10.1

配置项	型号
编程语言	Python
深度学习框架	PyTorch
操作系统	Windows 10
CPU	Inter Core i5-8500
运行内存	16 GB
GPU	NVIDIA GeForce GTX 2070
CUDA	10.1

类别	AP（IoU=0.5）
类别	YOLOv3	Tiny-YOLOv3	本文算法
areo	81.23	65.37	89.64
bike	80.26	70.24	88.31
bird	73.97	43.89	81.07
boat	65.46	47.68	67.59
bottle	64.12	24.97	68.22
bus	81.53	68.96	85.21
car	82.15	74.71	88.49
cat	83.14	65.73	87.02
chair	61.28	33.40	60.28
cow	77.33	53.72	84.42
table	75.58	49.11	75.66
dog	82.19	61.19	87.99
horse	84.69	75.34	86.72
mbike	81.29	72.13	85.33
person	78.46	69.10	86.81
plant	52.18	26.90	47.01
sheep	77.52	59.22	78.62
soft	74.41	50.90	82.56
train	81.66	75.03	83.33
tv	71.99	60.80	76.09

基于改进YOLOv3的多尺度目标检测算法

Multi-scale object detection algorithm based on improved YOLOv3

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 24

相关文章 15

编辑推荐

Metrics

IoU取值	AP
IoU取值	YOLOv3	本文算法
mAP	31.00	34.28
0.50	55.30	55.88
0.55	53.40	54.01
0.60	48.80	49.20
0.65	44.01	46.26
0.70	39.03	41.63
0.75	33.84	35.59
0.80	23.43	26.00
0.85	10.62	16.29
0.90	5.00	7.03
0.95	0.52	0.94

尺度	YOLOv3			本文算法
尺度	mAP	Precision	Recall	mAP	Precision	Recall
（0，110］	69.28	59.61	66.92	75.66	71.45	73.25
（110，230］	82.47	74.10	83.56	87.70	82.73	83.45
（230，400）	84.75	75.44	84.72	88.19	81.64	85.68

算法	mAP/%
Faster R-CNN	73.32
SSD	72.66
Effi-YOLOv3	73.28
文献［23］算法	79.24
文献［24］算法	81.50
SSD+BiFPN+SENet	80.24
本文算法	83.26

分组	改进					精度/%			mAP/%	速率/（frame·s^-1）
分组	A	B	C	D	E	小尺度目标	中尺度目标	大尺度目标	mAP/%	速率/（frame·s^-1）
1						69.28	82.47	84.75	76.37	18.0
2	√					70.34	82.21	83.23	75.79	16.1
3	√	√				72.09	83.33	86.79	78.85	20.9
4	√	√	√			72.45	84.10	86.89	79.24	21.2
5	√	√	√	√		73.20	85.67	87.46	82.69	20.7
6	√	√	√	√	√	75.66	87.70	88.19	83.26	22.0

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[5]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[6]	张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333.
[7]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[8]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[9]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[10]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[11]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[12]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[15]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.