Multi-scale object detection algorithm based on improved YOLOv3

doi:10.11772/j.issn.1001-9081.2021060984

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2423-2431.DOI: 10.11772/j.issn.1001-9081.2021060984

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Multi-scale object detection algorithm based on improved YOLOv3

Liying ZHANG¹, Chunjiang PANG¹, Xinying WANG¹(), Guoliang LI²

^1.Department of Computer，North China Electric Power University （Baoding），Baoding Hebei 071003，China
^2.Zaozhuang Power Supply Company，State Grid Shandong Electric Power Company，Zaozhuang Shandong 277800，China

Received:2021-06-10 Revised:2021-09-28 Accepted:2021-10-12 Online:2021-12-27 Published:2022-08-10
Contact: Xinying WANG
About author:ZHANG Liying， born in 1996， M. S. candidate. Her research interests include image processing， deep learning.
PANG Chunjiang， born in 1965， M. S.， associate professor. His research interests include graphics and image processing， deep learning.
WANG Xinying， born in 1984， M. S.， lecturer. His research interests include artificial intelligence.
LI Guoliang， born in 1981， M. S.， lecturer. His research interests include computer vision， knowledge graph.
Supported by:
Fundamental Research Funds for Central University(2021MS090);Science and Technology Project of Zaozhuang Power Supply Company of State Grid Shandong Electric Power Company(SD20-GC-ZB003-SGZH-KJ)

基于改进YOLOv3的多尺度目标检测算法

张丽莹¹, 庞春江¹, 王新颖¹(), 李国亮²

^1.华北电力大学（保定）计算机系，河北保定 071003
^2.国网山东省电力公司枣庄供电公司，山东枣庄 277800

通讯作者: 王新颖
作者简介:张丽莹（1996—），女，河北保定人，硕士研究生，主要研究方向：图像处理、深度学习；
庞春江（1965—），男，河北保定人，副教授，硕士，主要研究方向：图形图像处理、深度学习；
王新颖（1984—），男，河北保定人，讲师，硕士，主要研究方向：人工智能；
李国亮（1981—），男，河北安新人，讲师，硕士，主要研究方向：计算机视觉、知识图谱。
基金资助:
中央高校基本科研业务费专项资金资助项目(2021MS090);国网山东省电力公司枣庄供电公司科技项目(SD20-GC-ZB003-SGZH-KJ)

Abstract

Abstract:

In order to further improve the speed and precision of multi-scale object detection， and to solve the situations such as miss detection， wrong detection and repeated detection caused by small object detection， an object detection algorithm based on improved You Only Look Once v3 （YOLOv3） was proposed to realize automatic detection of multi-scale object. Firstly， the network structure was improved in the feature extraction network， and the attention mechanism was introduced into the spatial dimensions of residual module to pay attention to small objects. Then， Dense Convulutional Network （DenseNet） was used to fully integrate shallow information of the network， and the depthwise separable convolution was used to replace the normal convolution of the backbone network， thereby reducing the number of model parameters and improving the detection speed. In the feature fusion network， the bidirectional fusion of the shallow and deep features was realized through the bidirectional feature pyramid structure， and the 3-scale prediction was changed to 4-scale prediction， which improved the learning ability of multi-scale features. In terms of loss function， Generalized Intersection over Union （GIoU） was selected as the loss function， so that the precision of identifying objects was increased， and the object miss rate was reduced. Experimental results show that on Pascal VOC datasets， the mean Average Precision （mAP） of the improved YOLOv3 algorithm is as high as 83.26%， which is 5.89 percentage points higher than that of the original YOLOv3 algorithm， and the detection speed of the improved algorithm reaches 22.0 frame/s. Compared with the original YOLOv3 algorithm on Common Objects in COntext （COCO） dataset， the improved algorithm has the mAP improved by 3.28 percentage points. At the same time， in multi-scale object detection， the mAP of the algorithm has been improved， which verifies the effectiveness of the object detection algorithm based on the improved YOLOv3.

Key words: object detection, YOLOv3 (You Only Look Once v3), multi-scale object, bidirectional feature pyramid, attention mechanism

摘要：

为了进一步提高多尺度目标检测的速度和精度，解决小目标检测易造成的漏检、错检以及重复检测等问题，提出一种基于改进YOLOv3的目标检测算法实现多尺度目标的自动检测。首先，在特征提取网络中对网络结构进行改进，在残差模块的空间维度中引入注意力机制，对小目标进行关注；然后，利用密集连接网络（DenseNet）充分融合网络浅层信息，并用深度可分离卷积替换主干网络中的普通卷积，减少模型的参数量，提升检测速率。在特征融合网络中，通过双向金字塔结构实现深浅层特征的双向融合，并将3尺度预测变为4尺度预测，提高了多尺度特征的学习能力；在损失函数方面，选取GIoU（Generalized Intersection over Union）作为损失函数，提高目标识别的精度，降低目标漏检率。实验结果表明，基于改进YOLOv3（You Only Look Once v3）的目标检测算法在Pascal VOC测试集上的平均准确率均值（mAP）达到83.26%，与原YOLOv3算法相比提升了5.89个百分点，检测速度达22.0 frame/s；在COCO数据集上，与原YOLOv3算法相比，基于改进YOLOv3的目标检测算法在mAP上提升了3.28个百分点；同时，在进行多尺度的目标检测中，算法的mAP有所提升，验证了基于改进YOLOv3的目标检测算法的有效性。

关键词: 目标检测, YOLOv3, 多尺度目标, 双向特征金字塔, 注意力机制

CLC Number:

TP391.41

Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.

张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2423-2431.

Figures/Tables 18

References 24

1	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587. 10.1109/cvpr.2014.81
2	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
3	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 91-99.
4	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
5	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
6	REDMON R， FARHIDI A. YOLOv3： an incremental improvement［EB/OL］. （2018-04-08）［2021-03-20］. .
7	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. （2020-4-23）［2021-03-20］. .
8	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
9	FU C Y， LIU W， RANGA A， et al. DSSD： deconvolutional single shot detector［EB/OL］. （2017-01-23）［2021-03-05］. .
10	LIU S， HUANG D， WANG Y. Receptive field block net for accurate and fast object detection ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11215. Cham： Springer， 2018： 404-419.
11	ZHOU X Y， WANG D Q， KRÄHENBÜHL P. Object as points［EB/OL］. （2019-04-25）［2021-05-06］. .
12	刘晓楠，王正平，贺云涛，等.基于深度学习的小目标检测研究综述［J］.战术导弹技术， 2019（1）： 100-107.
	LIU X N， WANG Z P， HE Y T， et al. Research on small target detection based on deep learning［J］. Tactical Missile Technology， 2019（1）： 100-107.
13	马巧梅，王明俊，梁昊然.复杂场景下基于改进YOLOv3的车牌定位检测算法［J］.计算机工程与应用， 2021， 57（7）： 198-208.
	MA Q M， WANG M J， LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes［J］. Computer Engineering and Applications， 2021， 57（7）： 198-208.
14	刘丹，吴亚娟，罗南超，等.嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测［J］.计算机应用， 2020， 40（8）： 2225-2230. 10.11772/j.issn.1001-9081.2020010030
	LIU D， WU Y J， LUO N C， et al. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules［J］. Journal of Computer Applications， 2020， 40（8）： 2225-2230. 10.11772/j.issn.1001-9081.2020010030
15	许腾，唐贵进，刘清萍，等.基于空洞卷积和Focal Loss的改进YOLOv3算法［J］.南京邮电大学学报（自然科学版）， 2020， 40（6）： 100-108. 10.14132/j.cnki.1673-5439.2020.06.015
	XU T， TANG G J， LIU Q P， et al. Improved YOLOv3 based on dilated convolution and Focal Loss［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2020， 40（6）： 100-108. 10.14132/j.cnki.1673-5439.2020.06.015
16	TIAN D X， LIN C M， ZHOU J S， et al. SA-YOLOv3： an efficient and accurate object detector using self-attention mechanism for autonomous driving［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（5）： 4099-4110. 10.1109/tits.2020.3041278
17	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007. 10.1109/iccv.2017.324
18	REZATOFIGHI H， TSOI N， GWAK J， et al. Generalized intersection over union： a metric and a loss for bounding box regression ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 658-666. 10.1109/cvpr.2019.00075
19	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
20	HUANG G， LIU Z， L VAN DER MAATEN， et al. Densely connected convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2261-2269. 10.1109/cvpr.2017.243
21	HU J， SHEN L， SUN G. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
22	EVERINGHAM M， VAN GOOL L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge［J］. International Journal of Computer Vision， 2010， 88（2）： 303-338. 10.1007/s11263-009-0275-4
23	宦海，陈逸飞，张琳，等.一种改进的BR-YOLOv3目标检测网络［J］.计算机工程， 2021， 47（10）： 186-193. 10.19678/j.issn.1000-3428.0059234
	HUAN H， CHEN Y F， ZHANG L， et al. An improved BR-YOLOv3 object detection network［J］. Computer Engineering， 2021， 47（10）： 186-193. 10.19678/j.issn.1000-3428.0059234
24	刘紫燕，袁磊，朱明成，等.融合SPP和改进FPN的YOLOv3交通标志检测［J］.计算机工程与应用， 2021， 57（7）： 164-170.
	LIU Z Y， YUAN L， ZHU M C， et al. YOLOv3 traffic sign detection based on SPP and improved FPN［J］. Computer Engineering and Applications， 2021， 57（7）： 164-170.

类型	过滤器	尺寸	输出
Convolutional	32	3×3	416×416
Convolutional	64	3×3	208×208
Convolutional	32	1×1	208×208
Convolutional	64	3×3	208×208
Densenet unit
Convolutional	64	1×1
Average Pooling			104×104
Convolutional	64	1×1	104×104
Convolutional	128	3×3	104×104
Densenet unit
Convolutional	128	1×1
Convolutional	256	3×3/2	52×52
Convolutional	128	1×1
DW Conv	256	3×3
SE Block
Convolutional	128	1×1
Residual			52×52
Convolutional	256	1×1
Convolutional	512	3×3/2	26×26
Convolutional	256	1×1
DW Conv	512	3×3
SE Block
Convolutional	256	1×1
Residual			26×26
Convolutional	512	1×1
Convolutional	1 024	3×3/2	13×13
Convolutional	512	1×1
DW Conv	1 024	3×3
SE Block
Convolutional	512	1×1
Residual			13×13

类型	过滤器	尺寸	输出
Convolutional	32	3×3	416×416
Convolutional	64	3×3	208×208
Convolutional	32	1×1	208×208
Convolutional	64	3×3	208×208
Densenet unit
Convolutional	64	1×1
Average Pooling			104×104
Convolutional	64	1×1	104×104
Convolutional	128	3×3	104×104
Densenet unit
Convolutional	128	1×1
Convolutional	256	3×3/2	52×52
Convolutional	128	1×1
DW Conv	256	3×3
SE Block
Convolutional	128	1×1
Residual			52×52
Convolutional	256	1×1
Convolutional	512	3×3/2	26×26
Convolutional	256	1×1
DW Conv	512	3×3
SE Block
Convolutional	256	1×1
Residual			26×26
Convolutional	512	1×1
Convolutional	1 024	3×3/2	13×13
Convolutional	512	1×1
DW Conv	1 024	3×3
SE Block
Convolutional	512	1×1
Residual			13×13

配置项	型号
编程语言	Python
深度学习框架	PyTorch
操作系统	Windows 10
CPU	Inter Core i5-8500
运行内存	16 GB
GPU	NVIDIA GeForce GTX 2070
CUDA	10.1

配置项	型号
编程语言	Python
深度学习框架	PyTorch
操作系统	Windows 10
CPU	Inter Core i5-8500
运行内存	16 GB
GPU	NVIDIA GeForce GTX 2070
CUDA	10.1

类别	AP（IoU=0.5）
类别	YOLOv3	Tiny-YOLOv3	本文算法
areo	81.23	65.37	89.64
bike	80.26	70.24	88.31
bird	73.97	43.89	81.07
boat	65.46	47.68	67.59
bottle	64.12	24.97	68.22
bus	81.53	68.96	85.21
car	82.15	74.71	88.49
cat	83.14	65.73	87.02
chair	61.28	33.40	60.28
cow	77.33	53.72	84.42
table	75.58	49.11	75.66
dog	82.19	61.19	87.99
horse	84.69	75.34	86.72
mbike	81.29	72.13	85.33
person	78.46	69.10	86.81
plant	52.18	26.90	47.01
sheep	77.52	59.22	78.62
soft	74.41	50.90	82.56
train	81.66	75.03	83.33
tv	71.99	60.80	76.09

Multi-scale object detection algorithm based on improved YOLOv3

基于改进YOLOv3的多尺度目标检测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 24

Related Articles 15

Recommended Articles

Metrics

IoU取值	AP
IoU取值	YOLOv3	本文算法
mAP	31.00	34.28
0.50	55.30	55.88
0.55	53.40	54.01
0.60	48.80	49.20
0.65	44.01	46.26
0.70	39.03	41.63
0.75	33.84	35.59
0.80	23.43	26.00
0.85	10.62	16.29
0.90	5.00	7.03
0.95	0.52	0.94

尺度	YOLOv3			本文算法
尺度	mAP	Precision	Recall	mAP	Precision	Recall
（0，110］	69.28	59.61	66.92	75.66	71.45	73.25
（110，230］	82.47	74.10	83.56	87.70	82.73	83.45
（230，400）	84.75	75.44	84.72	88.19	81.64	85.68

算法	mAP/%
Faster R-CNN	73.32
SSD	72.66
Effi-YOLOv3	73.28
文献［23］算法	79.24
文献［24］算法	81.50
SSD+BiFPN+SENet	80.24
本文算法	83.26

分组	改进					精度/%			mAP/%	速率/（frame·s^-1）
分组	A	B	C	D	E	小尺度目标	中尺度目标	大尺度目标	mAP/%	速率/（frame·s^-1）
1						69.28	82.47	84.75	76.37	18.0
2	√					70.34	82.21	83.23	75.79	16.1
3	√	√				72.09	83.33	86.79	78.85	20.9
4	√	√	√			72.45	84.10	86.89	79.24	21.2
5	√	√	√	√		73.20	85.67	87.46	82.69	20.7
6	√	√	√	√	√	75.66	87.70	88.19	83.26	22.0

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[4]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[5]	Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333.
[6]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[7]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[8]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[9]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[10]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[11]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[12]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[13]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[14]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[15]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.