《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2423-2431.DOI: 10.11772/j.issn.1001-9081.2021060984
• 人工智能 • 上一篇
收稿日期:
2021-06-10
修回日期:
2021-09-28
接受日期:
2021-10-12
发布日期:
2021-12-27
出版日期:
2022-08-10
通讯作者:
王新颖
作者简介:
张丽莹(1996—),女,河北保定人,硕士研究生,主要研究方向:图像处理、深度学习;基金资助:
Liying ZHANG1, Chunjiang PANG1, Xinying WANG1(), Guoliang LI2
Received:
2021-06-10
Revised:
2021-09-28
Accepted:
2021-10-12
Online:
2021-12-27
Published:
2022-08-10
Contact:
Xinying WANG
About author:
ZHANG Liying, born in 1996, M. S. candidate. Her research interests include image processing, deep learning.Supported by:
摘要:
为了进一步提高多尺度目标检测的速度和精度,解决小目标检测易造成的漏检、错检以及重复检测等问题,提出一种基于改进YOLOv3的目标检测算法实现多尺度目标的自动检测。首先,在特征提取网络中对网络结构进行改进,在残差模块的空间维度中引入注意力机制,对小目标进行关注;然后,利用密集连接网络(DenseNet)充分融合网络浅层信息,并用深度可分离卷积替换主干网络中的普通卷积,减少模型的参数量,提升检测速率。在特征融合网络中,通过双向金字塔结构实现深浅层特征的双向融合,并将3尺度预测变为4尺度预测,提高了多尺度特征的学习能力;在损失函数方面,选取GIoU(Generalized Intersection over Union)作为损失函数,提高目标识别的精度,降低目标漏检率。实验结果表明,基于改进YOLOv3(You Only Look Once v3)的目标检测算法在Pascal VOC测试集上的平均准确率均值(mAP)达到83.26%,与原YOLOv3算法相比提升了5.89个百分点,检测速度达22.0 frame/s;在COCO数据集上,与原YOLOv3算法相比,基于改进YOLOv3的目标检测算法在mAP上提升了3.28个百分点;同时,在进行多尺度的目标检测中,算法的mAP有所提升,验证了基于改进YOLOv3的目标检测算法的有效性。
中图分类号:
张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 计算机应用, 2022, 42(8): 2423-2431.
Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
表1 改进的主干网络
Tab. 1 Improved backbone network
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
表2 实验配置环境
Tab. 2 Experimental configuration environment
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
表3 不同算法对不同目标检测准确率对比 ( %)
Tab. 3 Comparison of different algorithms for different objects on detection precision
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
表4 三种算法在Pascal VOC数据集上的性能比较
Tab. 4 Performance comparison of three methods on Pascal VOC datasets
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
表5 本文算法COCO数据集上的mAP@[0.50:0.95]测试结果 ( %)
Tab. 5 Detection results of mAP@[0.50:0.95] on COCO dataset
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
表6 不同尺度目标的检测结果 ( %)
Tab. 6 Detection results of objects with different scales
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
表7 不同算法检测结果对比
Tab. 7 Comparison of detection results of different algorithms
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
表8 消融实验结果对比
Tab. 8 Comparison of ablation experimental results
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
1 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
2 | GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. 10.1109/iccv.2015.169 |
3 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
4 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
5 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
6 | REDMON R, FARHIDI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2021-03-20]. . |
7 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-4-23) [2021-03-20]. . |
8 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
9 | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23) [2021-03-05]. . |
10 | LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham: Springer, 2018: 404-419. |
11 | ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Object as points[EB/OL]. (2019-04-25) [2021-05-06]. . |
12 | 刘晓楠,王正平,贺云涛,等.基于深度学习的小目标检测研究综述[J].战术导弹技术, 2019(1): 100-107. |
LIU X N, WANG Z P, HE Y T, et al. Research on small target detection based on deep learning[J]. Tactical Missile Technology, 2019(1): 100-107. | |
13 | 马巧梅,王明俊,梁昊然.复杂场景下基于改进YOLOv3的车牌定位检测算法[J].计算机工程与应用, 2021, 57(7): 198-208. |
MA Q M, WANG M J, LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes[J]. Computer Engineering and Applications, 2021, 57(7): 198-208. | |
14 | 刘丹,吴亚娟,罗南超,等.嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测[J].计算机应用, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 |
LIU D, WU Y J, LUO N C, et al. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules[J]. Journal of Computer Applications, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 | |
15 | 许腾,唐贵进,刘清萍,等.基于空洞卷积和Focal Loss的改进YOLOv3算法[J].南京邮电大学学报(自然科学版), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 |
XU T, TANG G J, LIU Q P, et al. Improved YOLOv3 based on dilated convolution and Focal Loss[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 | |
16 | TIAN D X, LIN C M, ZHOU J S, et al. SA-YOLOv3: an efficient and accurate object detector using self-attention mechanism for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4099-4110. 10.1109/tits.2020.3041278 |
17 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. 10.1109/iccv.2017.324 |
18 | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. 10.1109/cvpr.2019.00075 |
19 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
20 | HUANG G, LIU Z, L VAN DER MAATEN, et al. Densely connected convolutional networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. 10.1109/cvpr.2017.243 |
21 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
22 | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. 10.1007/s11263-009-0275-4 |
23 | 宦海,陈逸飞,张琳,等.一种改进的BR-YOLOv3目标检测网络[J].计算机工程, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 |
HUAN H, CHEN Y F, ZHANG L, et al. An improved BR-YOLOv3 object detection network[J]. Computer Engineering, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 | |
24 | 刘紫燕,袁磊,朱明成,等.融合SPP和改进FPN的YOLOv3交通标志检测[J].计算机工程与应用, 2021, 57(7): 164-170. |
LIU Z Y, YUAN L, ZHU M C, et al. YOLOv3 traffic sign detection based on SPP and improved FPN[J]. Computer Engineering and Applications, 2021, 57(7): 164-170. |
[1] | 吴明晖, 张广洁, 金苍宏. 基于多模态信息融合的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2326-2332. |
[2] | 吕振虎, 许新征, 张芳艳. 基于挤压激励的轻量化注意力机制模块[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2353-2360. |
[3] | 张新宇, 丁胜, 杨治佩. 基于改进注意力机制的交通标志检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2378-2385. |
[4] | 玄英律, 万源, 陈嘉慧. 基于多尺度卷积和注意力机制的LSTM时间序列分类[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2343-2352. |
[5] | 李坤, 侯庆. 基于注意力机制的轻量型人体姿态估计[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2407-2414. |
[6] | 钟志峰, 夏一帆, 周冬平, 晏阳天. 基于改进YOLOv4的轻量化目标检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2201-2209. |
[7] | 王海起, 王志海, 李留珂, 孔浩然, 王琼, 徐建波. 基于网格划分的城市短时交通流量时空预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2274-2280. |
[8] | 凡文俊, 赵曙光, 郭力争. 基于改进RetinaNet的船舶检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2248-2255. |
[9] | 刘博, 卿粼波, 王正勇, 刘美, 姜雪. 基于分块注意力机制和交互位置关系的群组活动识别[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2052-2057. |
[10] | 黄诚, 赵倩锐. 基于语言模型词嵌入和注意力机制的敏感信息检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2009-2014. |
[11] | 李晓寒, 王俊, 贾华丁, 萧刘. 基于多重注意力机制的图神经网络股市波动预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2265-2273. |
[12] | 谭湘粤, 胡晓, 杨佳信, 向俊将. 基于递进式特征增强聚合的伪装目标检测[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2192-2200. |
[13] | 刘万军, 王佳铭, 曲海成, 董利兵, 曹欣宇. 基于频谱空间域特征注意的音乐流派分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2072-2077. |
[14] | 张诗文, 邓春华, 张俊雯. 各向异性非极大值抑制在工业目标检测中的应用[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2210-2218. |
[15] | 张达为, 刘绪崇, 周维, 陈柱辉, 余瑶. 基于改进YOLOv3的实时交通标志检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2219-2226. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||