Journal of Computer Applications ›› 0, Vol. ›› Issue (): 286-295.DOI: 10.11772/j.issn.1001-9081.2023121749
• Multimedia computing and computer simulation • Previous Articles Next Articles
Ziyuan ZHOU1,2, Miao CHENG1,2,3(), Lian HE1,2,3, Jiacheng ZHANG3
Received:
2023-12-03
Revised:
2024-03-12
Accepted:
2024-03-14
Online:
2025-01-24
Published:
2024-12-31
Contact:
Miao CHENG
周子渊1,2, 成苗1,2,3(), 何莲1,2,3, 张佳成3
通讯作者:
成苗
作者简介:
周子渊(2000—),男,四川成都人,硕士研究生,主要研究方向:人工智能、机器视觉CLC Number:
Ziyuan ZHOU, Miao CHENG, Lian HE, Jiacheng ZHANG. Small and elongated object detection model based on improved YOLOv8[J]. Journal of Computer Applications, 0, (): 286-295.
周子渊, 成苗, 何莲, 张佳成. 基于改进YOLOv8的小目标与细长目标检测模型[J]. 《计算机应用》唯一官方网站, 0, (): 286-295.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023121749
颈部结构 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
PAN | 40.7 | 77.1 | 3.0 | 8.1 |
BiFPN | 40.5 | 76.6 | 3.1 | 8.3 |
AFPN | 40.9 | 78.1 | 3.4 | 8.7 |
Smallod | 40.9 | 77.5 | 3.1 | 12.2 |
Slimneck | 40.0 | 75.8 | 2.8 | 7.3 |
WPAN(本文) | 42.1 | 80.9 | 4.1 | 9.7 |
颈部结构 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
PAN | 40.7 | 77.1 | 3.0 | 8.1 |
BiFPN | 40.5 | 76.6 | 3.1 | 8.3 |
AFPN | 40.9 | 78.1 | 3.4 | 8.7 |
Smallod | 40.9 | 77.5 | 3.1 | 12.2 |
Slimneck | 40.0 | 75.8 | 2.8 | 7.3 |
WPAN(本文) | 42.1 | 80.9 | 4.1 | 9.7 |
特征交互模块 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
SPPF | 40.7 | 77.1 | 3.0 | 8.1 |
SPPFCSP | 40.6 | 75.5 | 4.6 | 9.4 |
SimCSPSPPF | 40.7 | 78.8 | 3.4 | 8.4 |
SE | 41.1 | 78.1 | 3.0 | 8.1 |
CA | 40.3 | 79.3 | 3.0 | 8.1 |
BAM | 41.2 | 79.2 | 3.0 | 8.1 |
CBAM | 40.8 | 79.6 | 3.0 | 8.1 |
Biformer | 41.6 | 78.6 | 3.3 | 62.4 |
LSKA | 40.6 | 78.1 | 3.0 | 8.3 |
AMFI(本文) | 41.3 | 80.1 | 3.0 | 8.3 |
特征交互模块 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
SPPF | 40.7 | 77.1 | 3.0 | 8.1 |
SPPFCSP | 40.6 | 75.5 | 4.6 | 9.4 |
SimCSPSPPF | 40.7 | 78.8 | 3.4 | 8.4 |
SE | 41.1 | 78.1 | 3.0 | 8.1 |
CA | 40.3 | 79.3 | 3.0 | 8.1 |
BAM | 41.2 | 79.2 | 3.0 | 8.1 |
CBAM | 40.8 | 79.6 | 3.0 | 8.1 |
Biformer | 41.6 | 78.6 | 3.3 | 62.4 |
LSKA | 40.6 | 78.1 | 3.0 | 8.3 |
AMFI(本文) | 41.3 | 80.1 | 3.0 | 8.3 |
边界框回归 损失函数 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
CIoU | 40.7 | 77.1 | 3.0 | 8.1 |
EIoU | 39.3 | 77.7 | 3.0 | 8.1 |
SIoU | 40.3 | 77.8 | 3.0 | 8.1 |
MPDIoU | 40.1 | 76.9 | 3.0 | 8.1 |
Wise-IoU | 40.9 | 78.1 | 3.0 | 8.1 |
NWD | 41.0 | 77.0 | 3.0 | 8.1 |
NWD&Inner-CIoU (本文) | 41.1 | 78.6 | 3.0 | 8.1 |
边界框回归 损失函数 | mAP50:95/ % | mAP50/ % | 参数量/106 | 计算量/GFLOPs |
---|---|---|---|---|
CIoU | 40.7 | 77.1 | 3.0 | 8.1 |
EIoU | 39.3 | 77.7 | 3.0 | 8.1 |
SIoU | 40.3 | 77.8 | 3.0 | 8.1 |
MPDIoU | 40.1 | 76.9 | 3.0 | 8.1 |
Wise-IoU | 40.9 | 78.1 | 3.0 | 8.1 |
NWD | 41.0 | 77.0 | 3.0 | 8.1 |
NWD&Inner-CIoU (本文) | 41.1 | 78.6 | 3.0 | 8.1 |
模型 | mAP50:95/ % | mAP50/ % | 参数量/ 106 | 计算量/GFLOPs |
---|---|---|---|---|
Faster-RCNN(VGG16) | 40.6 | 77.3 | 136.9 | 118.5 |
SSD300 | 40.4 | 62.9 | 30.8 | 24.7 |
EfficientDet-D2 | 41.0 | 79.6 | 8.0 | 10.4 |
FCOS | 41.3 | 72.7 | 32.1 | 80.7 |
YOLOv5n | 38.2 | 75.5 | 2.5 | 7.1 |
YOLOv5s | 40.6 | 79.3 | 9.1 | 23.8 |
YOLOv6n | 39.7 | 76.0 | 4.2 | 11.8 |
YOLOv6s | 40.0 | 78.7 | 16.3 | 44.0 |
YOLOv8n | 40.7 | 77.1 | 3.0 | 8.1 |
YOLOv8s | 42.2 | 79.6 | 11.1 | 28.5 |
YOLOv8m | 42.3 | 79.9 | 25.8 | 78.7 |
YOLOv8l | 42.5 | 80.1 | 43.6 | 164.9 |
YOLOv8x | 42.7 | 81.0 | 68.1 | 257.4 |
RT-DETR-R50 | 38.5 | 66.1 | 42.8 | 135.8 |
RT-DETR-R101 | 40.5 | 67.1 | 76.6 | 259.2 |
本文模型 | 42.6 | 81.7 | 4.1 | 9.9 |
模型 | mAP50:95/ % | mAP50/ % | 参数量/ 106 | 计算量/GFLOPs |
---|---|---|---|---|
Faster-RCNN(VGG16) | 40.6 | 77.3 | 136.9 | 118.5 |
SSD300 | 40.4 | 62.9 | 30.8 | 24.7 |
EfficientDet-D2 | 41.0 | 79.6 | 8.0 | 10.4 |
FCOS | 41.3 | 72.7 | 32.1 | 80.7 |
YOLOv5n | 38.2 | 75.5 | 2.5 | 7.1 |
YOLOv5s | 40.6 | 79.3 | 9.1 | 23.8 |
YOLOv6n | 39.7 | 76.0 | 4.2 | 11.8 |
YOLOv6s | 40.0 | 78.7 | 16.3 | 44.0 |
YOLOv8n | 40.7 | 77.1 | 3.0 | 8.1 |
YOLOv8s | 42.2 | 79.6 | 11.1 | 28.5 |
YOLOv8m | 42.3 | 79.9 | 25.8 | 78.7 |
YOLOv8l | 42.5 | 80.1 | 43.6 | 164.9 |
YOLOv8x | 42.7 | 81.0 | 68.1 | 257.4 |
RT-DETR-R50 | 38.5 | 66.1 | 42.8 | 135.8 |
RT-DETR-R101 | 40.5 | 67.1 | 76.6 | 259.2 |
本文模型 | 42.6 | 81.7 | 4.1 | 9.9 |
模型 | mAP50:95/ % | mAP50/ % | 参数量/ 106 | 计算量/GFLOPs |
---|---|---|---|---|
YOLOv8n | 40.7 | 77.1 | 3.0 | 8.1 |
+WPAN | 42.1 | 80.9 | 4.1 | 9.7 |
+WPAN+AMFI | 42.3 | 81.4 | 4.1 | 9.9 |
+WANI | 42.6 | 81.7 | 4.1 | 9.9 |
模型 | mAP50:95/ % | mAP50/ % | 参数量/ 106 | 计算量/GFLOPs |
---|---|---|---|---|
YOLOv8n | 40.7 | 77.1 | 3.0 | 8.1 |
+WPAN | 42.1 | 80.9 | 4.1 | 9.7 |
+WPAN+AMFI | 42.3 | 81.4 | 4.1 | 9.9 |
+WANI | 42.6 | 81.7 | 4.1 | 9.9 |
模型 | mAP50:95/ % | mAP50/ % | 参数量 /106 | 计算量/GFLOPs |
---|---|---|---|---|
YOLOv8n | 38.8 | 74.2 | 3.0 | 8.1 |
YOLOv8s | 38.5 | 73.7 | 11.1 | 28.4 |
本文模型 | 40.3 | 76.1 | 4.1 | 9.9 |
模型 | mAP50:95/ % | mAP50/ % | 参数量 /106 | 计算量/GFLOPs |
---|---|---|---|---|
YOLOv8n | 38.8 | 74.2 | 3.0 | 8.1 |
YOLOv8s | 38.5 | 73.7 | 11.1 | 28.4 |
本文模型 | 40.3 | 76.1 | 4.1 | 9.9 |
1 | 曹家乐,李亚利,孙汉卿,等. 基于深度学习的视觉目标检测技术综述[J]. 中国图象图形学报, 2022, 27(6): 1697-1722. |
2 | DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 764-773. |
3 | QI Y, HE Y, QI X, et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 6047-6056. |
4 | LI J, LIANG X, WEI Y, et al. Perceptual generative adversarial networks for small object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1951-1959. |
5 | BAI Y, ZHANG Y, DING M, et al. SOD-MTGAN: small object detection via multi-task generative adversarial network [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11217. Cham: Springer, 2018: 210-226. |
6 | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. |
7 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. |
8 | GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 7029-7038. |
9 | TAN M, PANG R, LE Q V. EfficientDet: scalable and efficient object detection[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10778-10787. |
10 | ZHAO G, GE W, YU Y. GraphFPN: graph feature pyramid network for object detection[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 2743-2752. |
11 | YANG G, LEI J, ZHU Z, et al. AFPN: asymptotic feature pyramid network for object detection[C]// Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics. Piscataway: IEEE, 2023: 2184-2189. |
12 | WANG J, XU C, YANG W, et al. A normalized Gaussian Wasserstein distance for tiny object detection[EB/OL]. [2023-06-14].. |
13 | ZHANG H, XU C, ZHANG S. Inner-IoU: more effective intersection over union loss with auxiliary bounding box[EB/OL]. [2023-12-14].. |
14 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
15 | LIU S, HUANG D, WANG Y. Learning spatial fusion for single-shot object detection[EB/OL]. [2023-11-25].. |
16 | ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[EB/OL]. [2023-08-06].. |
17 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. |
18 | HOU Q, ZHOU D, FENG J. Coordinate attention for efficient mobile network design [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. |
19 | PARK J, WOO S, LEE J Y, et al. BAM: bottleneck attention module[C]// Proceedings of the 2018 British Machine Vision Conference. Durham: BMVA Press, 2018: No.92. |
20 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
21 | LAU K W, PO L M, REHMAN Y A UR. Large separable kernel attention: rethinking the large kernel attention design in CNN[J]. Expert Systems with Applications, 2024, 236: No.121352. |
22 | ZHU X, SU W, LU L, et al. Deformable DETR: deformable transformers for end-to-end object detection [EB/OL]. [2023-03-18].. |
23 | ZHU L, WANG X, KE Z, et al. BiFormer: vision Transformer with bi-level routing attention[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10323-10333. |
24 | OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning [C]// Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2023: 1-5. |
25 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. |
26 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-04-23].. |
27 | MSEDDI W S, GHALI R, JMAL M, et al. Fire detection and segmentation using YOLOv5 and U-Net [C]// Proceedings of the 29th European Signal Processing Conference. Piscataway: IEEE, 2021: 741-745. |
28 | CHEN Q, WANG Y, YANG T, et al. You only look one-level feature[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13034-13043. |
29 | WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. |
30 | HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. |
31 | ZHANG H, CHANG H, MA B, et al. Dynamic R-CNN: towards high quality object detection via dynamic training [C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12360. Cham: Springer, 2020: 260-275. |
32 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12346. Cham: Springer, 2020: 213-229. |
33 | GIRSHICK R. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 1440-1448. |
34 | YU J, JIANG Y, WANG Z, et al. UnitBox: an advanced object detection network[C]// Proceedings of the 24th ACM International Conference on Multimedia. New York: ACM, 2016: 516-520. |
35 | REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 658-666. |
36 | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 12993-13000. |
37 | ZHENG Z, WANG P, REN D, et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation [J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8574-8586. |
38 | ZHANG Y F, REN W, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157. |
39 | GEVORGYAN Z. SIoU loss: more powerful learning for bounding box regression[EB/OL]. [2023-05-25].. |
40 | TONG Z, CHEN Y, XU Z, et al. Wise-IoU: bounding box regression loss with dynamic focusing mechanism[EB/OL]. [2023-04-08].. |
41 | MA S, XU Y. MDPIoU: a loss for efficient and accurate bounding box regression[EB/OL]. [2023-09-14].. |
42 | REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6): 1137-1149. |
43 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
44 | CAI Z, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6154-6162. |
45 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007. |
46 | TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 9626-9635. |
47 | LI H, LI J, WEI H, et al. Slim-Neck by GSConv: a better design paradigm of detector architectures for autonomous vehicles[EB/OL]. [2023-08-17].. |
48 | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 618-626. |
49 | LI C, LI L, JIANG H, et al. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. [2023-08-07].. |
50 | HE Y, SONG K, MENG Q, et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features [J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(4): 1493-1504. |
[1] | Qingqing ZHAO, Bin HU. Moving pedestrian detection neural network with invariant global sparse contour point representation [J]. Journal of Computer Applications, 2025, 45(4): 1271-1284. |
[2] | Shiyue GUO, Jianwu DANG, Yangping WANG, Jiu YONG. 3D hand pose estimation combining attention mechanism and multi-scale feature fusion [J]. Journal of Computer Applications, 2025, 45(4): 1293-1299. |
[3] | Yang HOU, Qiong ZHANG, Zixuan ZHAO, Zhengyu ZHU, Xiaobo ZHANG. YOLOv5s-MRD: efficient fire and smoke detection algorithm for complex scenarios based on YOLOv5s [J]. Journal of Computer Applications, 2025, 45(4): 1317-1324. |
[4] | Liqin WANG, Zhilei GENG, Yingshuang LI, Yongfeng DONG, Meng BIAN. Open-world knowledge reasoning model based on path and enhanced triplet text [J]. Journal of Computer Applications, 2025, 45(4): 1177-1183. |
[5] | Liwei ZHANG, Quan LIANG, Yutao HU, Qiaole ZHU. Channel shuffle attention mechanism based on group convolution [J]. Journal of Computer Applications, 2025, 45(4): 1069-1076. |
[6] | Kunyuan JIANG, Xiaoxia LI, Li WANG, Yaodan CAO, Xiaoqiang ZHANG, Nan DING, Yingyue ZHOU. Boundary-cross supervised semantic segmentation network with decoupled residual self-attention [J]. Journal of Computer Applications, 2025, 45(4): 1120-1129. |
[7] | Jie HU, Qiyang ZHENG, Jun SUN, Yan ZHANG. Multi-label classification model based on multi-label relational graph and local dynamic reconstruction learning [J]. Journal of Computer Applications, 2025, 45(4): 1104-1112. |
[8] | Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU. Consultation recommendation method based on knowledge graph and dialogue structure [J]. Journal of Computer Applications, 2025, 45(4): 1157-1168. |
[9] | Chuanhao ZHANG, Xiaohan TU, Xuehui GU, Bo XUAN. LiDAR-camera 3D object detection based on multi-modal information mutual guidance and supplementation [J]. Journal of Computer Applications, 2025, 45(3): 946-952. |
[10] | Haijun GENG, Yun DONG, Zhiguo HU, Haotian CHI, Jing YANG, Xia YIN. Encrypted traffic classification method based on Attention-1DCNN-CE [J]. Journal of Computer Applications, 2025, 45(3): 872-882. |
[11] | Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654. |
[12] | Dixin WANG, Jiahao WANG, Min LI, Hao CHEN, Guangyao HU, Yu GONG. Abnormal attack detection for underwater acoustic communication network [J]. Journal of Computer Applications, 2025, 45(2): 526-533. |
[13] | Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382. |
[14] | Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361. |
[15] | Jiayang GUI, Shunji WANG, Zhengkang ZHOU, Jiashan TANG. Tunnel foreign object detection algorithm based on improved YOLOv8n [J]. Journal of Computer Applications, 2025, 45(2): 655-661. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||