Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2423-2431.DOI: 10.11772/j.issn.1001-9081.2021060984
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Liying ZHANG1, Chunjiang PANG1, Xinying WANG1(), Guoliang LI2
Received:
2021-06-10
Revised:
2021-09-28
Accepted:
2021-10-12
Online:
2021-12-27
Published:
2022-08-10
Contact:
Xinying WANG
About author:
ZHANG Liying, born in 1996, M. S. candidate. Her research interests include image processing, deep learning.Supported by:
通讯作者:
王新颖
作者简介:
张丽莹(1996—),女,河北保定人,硕士研究生,主要研究方向:图像处理、深度学习;基金资助:
CLC Number:
Liying ZHANG, Chunjiang PANG, Xinying WANG, Guoliang LI. Multi-scale object detection algorithm based on improved YOLOv3[J]. Journal of Computer Applications, 2022, 42(8): 2423-2431.
张丽莹, 庞春江, 王新颖, 李国亮. 基于改进YOLOv3的多尺度目标检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2423-2431.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021060984
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
Tab. 1 Improved backbone network
类型 | 过滤器 | 尺寸 | 输出 |
---|---|---|---|
Convolutional | 32 | 3×3 | 416×416 |
Convolutional | 64 | 3×3 | 208×208 |
Convolutional | 32 | 1×1 | 208×208 |
Convolutional | 64 | 3×3 | 208×208 |
Densenet unit | |||
Convolutional | 64 | 1×1 | |
Average Pooling | 104×104 | ||
Convolutional | 64 | 1×1 | 104×104 |
Convolutional | 128 | 3×3 | 104×104 |
Densenet unit | |||
Convolutional | 128 | 1×1 | |
Convolutional | 256 | 3×3/2 | 52×52 |
Convolutional | 128 | 1×1 | |
DW Conv | 256 | 3×3 | |
SE Block | |||
Convolutional | 128 | 1×1 | |
Residual | 52×52 | ||
Convolutional | 256 | 1×1 | |
Convolutional | 512 | 3×3/2 | 26×26 |
Convolutional | 256 | 1×1 | |
DW Conv | 512 | 3×3 | |
SE Block | |||
Convolutional | 256 | 1×1 | |
Residual | 26×26 | ||
Convolutional | 512 | 1×1 | |
Convolutional | 1 024 | 3×3/2 | 13×13 |
Convolutional | 512 | 1×1 | |
DW Conv | 1 024 | 3×3 | |
SE Block | |||
Convolutional | 512 | 1×1 | |
Residual | 13×13 |
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
Tab. 2 Experimental configuration environment
配置项 | 型号 |
---|---|
编程语言 | Python |
深度学习框架 | PyTorch |
操作系统 | Windows 10 |
CPU | Inter Core i5-8500 |
运行内存 | 16 GB |
GPU | NVIDIA GeForce GTX 2070 |
CUDA | 10.1 |
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
Tab. 3 Comparison of different algorithms for different objects on detection precision
类别 | AP(IoU=0.5) | ||
---|---|---|---|
YOLOv3 | Tiny-YOLOv3 | 本文算法 | |
areo | 81.23 | 65.37 | 89.64 |
bike | 80.26 | 70.24 | 88.31 |
bird | 73.97 | 43.89 | 81.07 |
boat | 65.46 | 47.68 | 67.59 |
bottle | 64.12 | 24.97 | 68.22 |
bus | 81.53 | 68.96 | 85.21 |
car | 82.15 | 74.71 | 88.49 |
cat | 83.14 | 65.73 | 87.02 |
chair | 61.28 | 33.40 | 60.28 |
cow | 77.33 | 53.72 | 84.42 |
table | 75.58 | 49.11 | 75.66 |
dog | 82.19 | 61.19 | 87.99 |
horse | 84.69 | 75.34 | 86.72 |
mbike | 81.29 | 72.13 | 85.33 |
person | 78.46 | 69.10 | 86.81 |
plant | 52.18 | 26.90 | 47.01 |
sheep | 77.52 | 59.22 | 78.62 |
soft | 74.41 | 50.90 | 82.56 |
train | 81.66 | 75.03 | 83.33 |
tv | 71.99 | 60.80 | 76.09 |
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
Tab. 4 Performance comparison of three methods on Pascal VOC datasets
算法 | mAP@0.5/% | 检测时间/ms |
---|---|---|
YOLOv3 | 77.37 | 20 |
Tiny-YOLOv3 | 57.34 | 6 |
本文算法 | 83.26 | 28 |
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
Tab. 5 Detection results of mAP@[0.50:0.95] on COCO dataset
IoU取值 | AP | |
---|---|---|
YOLOv3 | 本文算法 | |
mAP | 31.00 | 34.28 |
0.50 | 55.30 | 55.88 |
0.55 | 53.40 | 54.01 |
0.60 | 48.80 | 49.20 |
0.65 | 44.01 | 46.26 |
0.70 | 39.03 | 41.63 |
0.75 | 33.84 | 35.59 |
0.80 | 23.43 | 26.00 |
0.85 | 10.62 | 16.29 |
0.90 | 5.00 | 7.03 |
0.95 | 0.52 | 0.94 |
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
Tab. 6 Detection results of objects with different scales
尺度 | YOLOv3 | 本文算法 | ||||
---|---|---|---|---|---|---|
mAP | Precision | Recall | mAP | Precision | Recall | |
(0,110] | 69.28 | 59.61 | 66.92 | 75.66 | 71.45 | 73.25 |
(110,230] | 82.47 | 74.10 | 83.56 | 87.70 | 82.73 | 83.45 |
(230,400) | 84.75 | 75.44 | 84.72 | 88.19 | 81.64 | 85.68 |
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
Tab. 7 Comparison of detection results of different algorithms
算法 | mAP/% |
---|---|
Faster R-CNN | 73.32 |
SSD | 72.66 |
Effi-YOLOv3 | 73.28 |
文献[ | 79.24 |
文献[ | 81.50 |
SSD+BiFPN+SENet | 80.24 |
本文算法 | 83.26 |
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
Tab. 8 Comparison of ablation experimental results
分组 | 改进 | 精度/% | mAP/% | 速率/(frame·s-1) | ||||||
---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | 小尺度目标 | 中尺度目标 | 大尺度目标 | |||
1 | 69.28 | 82.47 | 84.75 | 76.37 | 18.0 | |||||
2 | √ | 70.34 | 82.21 | 83.23 | 75.79 | 16.1 | ||||
3 | √ | √ | 72.09 | 83.33 | 86.79 | 78.85 | 20.9 | |||
4 | √ | √ | √ | 72.45 | 84.10 | 86.89 | 79.24 | 21.2 | ||
5 | √ | √ | √ | √ | 73.20 | 85.67 | 87.46 | 82.69 | 20.7 | |
6 | √ | √ | √ | √ | √ | 75.66 | 87.70 | 88.19 | 83.26 | 22.0 |
1 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 580-587. 10.1109/cvpr.2014.81 |
2 | GIRSHICK R. Fast R-CNN [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. 10.1109/iccv.2015.169 |
3 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
4 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
5 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
6 | REDMON R, FARHIDI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2021-03-20]. . |
7 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-4-23) [2021-03-20]. . |
8 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
9 | FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector[EB/OL]. (2017-01-23) [2021-03-05]. . |
10 | LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11215. Cham: Springer, 2018: 404-419. |
11 | ZHOU X Y, WANG D Q, KRÄHENBÜHL P. Object as points[EB/OL]. (2019-04-25) [2021-05-06]. . |
12 | 刘晓楠,王正平,贺云涛,等.基于深度学习的小目标检测研究综述[J].战术导弹技术, 2019(1): 100-107. |
LIU X N, WANG Z P, HE Y T, et al. Research on small target detection based on deep learning[J]. Tactical Missile Technology, 2019(1): 100-107. | |
13 | 马巧梅,王明俊,梁昊然.复杂场景下基于改进YOLOv3的车牌定位检测算法[J].计算机工程与应用, 2021, 57(7): 198-208. |
MA Q M, WANG M J, LIANG H R. License plate location detection algorithm based on improved YOLOv3 in complex scenes[J]. Computer Engineering and Applications, 2021, 57(7): 198-208. | |
14 | 刘丹,吴亚娟,罗南超,等.嵌入注意力和特征交织模块的Gaussian-YOLO v3目标检测[J].计算机应用, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 |
LIU D, WU Y J, LUO N C, et al. Object detection of Gaussian-YOLO v3 implanting attention and feature intertwine modules[J]. Journal of Computer Applications, 2020, 40(8): 2225-2230. 10.11772/j.issn.1001-9081.2020010030 | |
15 | 许腾,唐贵进,刘清萍,等.基于空洞卷积和Focal Loss的改进YOLOv3算法[J].南京邮电大学学报(自然科学版), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 |
XU T, TANG G J, LIU Q P, et al. Improved YOLOv3 based on dilated convolution and Focal Loss[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2020, 40(6): 100-108. 10.14132/j.cnki.1673-5439.2020.06.015 | |
16 | TIAN D X, LIN C M, ZHOU J S, et al. SA-YOLOv3: an efficient and accurate object detector using self-attention mechanism for autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(5): 4099-4110. 10.1109/tits.2020.3041278 |
17 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. 10.1109/iccv.2017.324 |
18 | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. 10.1109/cvpr.2019.00075 |
19 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
20 | HUANG G, LIU Z, L VAN DER MAATEN, et al. Densely connected convolutional networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. 10.1109/cvpr.2017.243 |
21 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
22 | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. 10.1007/s11263-009-0275-4 |
23 | 宦海,陈逸飞,张琳,等.一种改进的BR-YOLOv3目标检测网络[J].计算机工程, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 |
HUAN H, CHEN Y F, ZHANG L, et al. An improved BR-YOLOv3 object detection network[J]. Computer Engineering, 2021, 47(10): 186-193. 10.19678/j.issn.1000-3428.0059234 | |
24 | 刘紫燕,袁磊,朱明成,等.融合SPP和改进FPN的YOLOv3交通标志检测[J].计算机工程与应用, 2021, 57(7): 164-170. |
LIU Z Y, YUAN L, ZHU M C, et al. YOLOv3 traffic sign detection based on SPP and improved FPN[J]. Computer Engineering and Applications, 2021, 57(7): 164-170. |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[3] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[4] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[5] | Yingjun ZHANG, Niuniu LI, Binhong XIE, Rui ZHANG, Wangdong LU. Semi-supervised object detection framework guided by curriculum learning [J]. Journal of Computer Applications, 2024, 44(8): 2326-2333. |
[6] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[7] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[8] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[9] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[10] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[11] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[12] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[13] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
[14] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[15] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||