《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2423-2431.DOI: 10.11772/j.issn.1001-9081.2021060984
所属专题: 人工智能
收稿日期:
2021-06-10
修回日期:
2021-09-28
接受日期:
2021-10-12
发布日期:
2021-12-27
出版日期:
2022-08-10
通讯作者:
王新颖
作者简介:
张丽莹(1996—),女,河北保定人,硕士研究生,主要研究方向:图像处理、深度学习;基金资助:
Liying ZHANG1, Chunjiang PANG1, Xinying WANG1(), Guoliang LI2
Received:
2021-06-10
Revised:
2021-09-28
Accepted:
2021-10-12
Online:
2021-12-27
Published:
2022-08-10
Contact:
Xinying WANG
About author:
ZHANG Liying, born in 1996, M. S. candidate. Her research interests include image processing, deep learning.Supported by:
摘要:
为了进一步提高多尺度目标检测的速度和精度,解决小目标检测易造成的漏检、错检以及重复检测等问题,提出一种基于改进YOLOv3的目标检测算法实现多尺度目标的自动检测。首先,在特征提取网络中对网络结构进行改进,在残差模块的空间维度中引入注意力机制,对小目标进行关注;然后,利用密集连接网络(DenseNet)充分融合网络浅层信息,并用深度可分离卷积替换主干网络中的普通卷积,减少模型的参数量,提升检测速率。在特征融合网络中,通过双向金字塔结构实现深浅层特征的双向融合,并将3尺度预测变为4尺度预测,提高了多尺度特征的学习能力;在损失函数方面,选取GIoU(Generalized Intersection over Union)作为损失函数,提高目标识别的精度,降低目标漏检率。实验结果表明,基于改进YOLOv3(You Only Look Once v3)的目标检测算法在Pascal VOC测试集上的平均准确率均值(mAP)达到83.26%,与原YOLOv3算法相比提升了5.89个百分点,检测速度达22.0 frame/s;在COCO数据集上,与原YOLOv3算法相比,基于改进YOLOv3的目标检测算法在mAP上提升了3.28个百分点;同时,在进行多尺度的目标检测中,算法的mAP有所提升,验证了基于改进YOLOv3的目标检测算法的有效性。
中图分类号:
张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 计算机应用, 2022, 42(8): 2423-2431.
Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
表1 改进的主干网络
Tab. 1 Improved backbone network
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
表2 实验配置环境
Tab. 2 Experimental configuration environment
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
表3 不同算法对不同目标检测准确率对比 ( %)
Tab. 3 Comparison of different algorithms for different objects on detection precision
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
表4 三种算法在Pascal VOC数据集上的性能比较
Tab. 4 Performance comparison of three methods on Pascal VOC datasets
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
表5 本文算法COCO数据集上的mAP@[0.50:0.95]测试结果 ( %)
Tab. 5 Detection results of mAP@[0.50:0.95] on COCO dataset
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
表6 不同尺度目标的检测结果 ( %)
Tab. 6 Detection results of objects with different scales
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
表7 不同算法检测结果对比
Tab. 7 Comparison of detection results of different algorithms
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
表8 消融实验结果对比
Tab. 8 Comparison of ablation experimental results
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
1 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
2 | GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. 10.1109/iccv.2015.169 |
3 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
4 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
5 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
6 | REDMON R, FARHIDI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2021-03-20]. . |
7 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-4-23) [2021-03-20]. . |
8 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
9 | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23) [2021-03-05]. . |
10 | LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham: Springer, 2018: 404-419. |
11 | ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Object as points[EB/OL]. (2019-04-25) [2021-05-06]. . |
12 | 刘晓楠,王正平,贺云涛,等.基于深度学习的小目标检测研究综述[J].战术导弹技术, 2019(1): 100-107. |
LIU X N, WANG Z P, HE Y T, et al. Research on small target detection based on deep learning[J]. Tactical Missile Technology, 2019(1): 100-107. | |
13 | 马巧梅,王明俊,梁昊然.复杂场景下基于改进YOLOv3的车牌定位检测算法[J].计算机工程与应用, 2021, 57(7): 198-208. |
MA Q M, WANG M J, LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes[J]. Computer Engineering and Applications, 2021, 57(7): 198-208. | |
14 | 刘丹,吴亚娟,罗南超,等.嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测[J].计算机应用, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 |
LIU D, WU Y J, LUO N C, et al. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules[J]. Journal of Computer Applications, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 | |
15 | 许腾,唐贵进,刘清萍,等.基于空洞卷积和Focal Loss的改进YOLOv3算法[J].南京邮电大学学报(自然科学版), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 |
XU T, TANG G J, LIU Q P, et al. Improved YOLOv3 based on dilated convolution and Focal Loss[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 | |
16 | TIAN D X, LIN C M, ZHOU J S, et al. SA-YOLOv3: an efficient and accurate object detector using self-attention mechanism for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4099-4110. 10.1109/tits.2020.3041278 |
17 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. 10.1109/iccv.2017.324 |
18 | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. 10.1109/cvpr.2019.00075 |
19 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
20 | HUANG G, LIU Z, L VAN DER MAATEN, et al. Densely connected convolutional networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. 10.1109/cvpr.2017.243 |
21 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
22 | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. 10.1007/s11263-009-0275-4 |
23 | 宦海,陈逸飞,张琳,等.一种改进的BR-YOLOv3目标检测网络[J].计算机工程, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 |
HUAN H, CHEN Y F, ZHANG L, et al. An improved BR-YOLOv3 object detection network[J]. Computer Engineering, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 | |
24 | 刘紫燕,袁磊,朱明成,等.融合SPP和改进FPN的YOLOv3交通标志检测[J].计算机工程与应用, 2021, 57(7): 164-170. |
LIU Z Y, YUAN L, ZHU M C, et al. YOLOv3 traffic sign detection based on SPP and improved FPN[J]. Computer Engineering and Applications, 2021, 57(7): 164-170. |
[1] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[2] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[3] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[4] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[5] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[6] | 张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333. |
[7] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[8] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[9] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[10] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
[11] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
[12] | 刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109. |
[13] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
[14] | 李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182. |
[15] | 魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||