Small and elongated object detection model based on improved YOLOv8

doi:10.11772/j.issn.1001-9081.2023121749

Abstract

Abstract:

Real-time and accurate detection of glass defects is crucial. However， the task is highly challenging due to the variably scaled morphologies of the defects as well as both small and extreme aspect ratio based elongated objects with weak features. To address the requirements， small and elongated object detection model based on improved YOLOv8 （You Only Look Once version 8） was proposed， named YOLO-WANI （WPAN+AMFI+NWD&Inner-CIoU）. Firstly， Weighted Path Aggregation Network （WPAN） was designed to reduce the loss of information on small and elongated object during network propagation and balance the importance of information with different scales. Then， Attention-based Multi-scale Feature Interaction module （AMFI） was introduced to capture semantic information focusing on objects in deep features. After that， Normalization Wasserstein Distance （NWD） and Inner-CIoU loss were employed to replace the original CIoU （Complete Intersection over Union） for detection efficiency improvement of small and elongated objects. Finally， the glass defect detection dataset was created to validate the model performance. Experimental results show that compared to YOLOv8n， YOLO-WANI has improvements of 1.9 percentage points in mAP50：95 and 4.6 percentage points in mAP50 on the created glass defect detection dataset， reaching 42.6% and 81.7%， respectively； on the steel defect detection dataset NEU-DET （the NorthEastern University surface defect database for defect DETection task）， YOLO-WANI has improvements of 1.5 percentage points in mAP50：95 and 1.9 percentage points in mAP50， reaching 40.3% and 76.1%， respectively. The proposed model outperforms real-time defect detection models at various orders on precision with only 4.1 million parameters and 9.9 GFLOPs computational cost， as well as Frames Per Second （FPS） of 138 and single-image inference time of （7.16±0.17） ms， meeting the requirements for lightweight and high-precision.

Key words: defect detection, multi-scale feature fusion, attention mechanism, bounding box regression, object detection

摘要：

实时、准确的玻璃缺陷检测至关重要；然而，尺度多变的缺陷形态以及特征微弱的小目标和长宽比例极端的细长目标让这个任务极具挑战性。针对上述需求，提出一种基于改进YOLOv8（You Only Look Once version 8）的小目标与细长目标检测模型YOLO-WANI（WPAN+AMFI+NWD&Inner-CIoU）。首先，设计WPAN（Weighted Path Aggregation Network）减小小目标和细长目标信息在网络传播过程中发生的损失，从而平衡不同尺度信息的重要性；其次，引入基于注意力的多尺度特征交互模块（AMFI），以捕捉深层特征中聚焦对象的语义信息；再次，使用归一化沃瑟斯坦距离（NWD）和Inner-CIoU损失替换原始的CIoU（Complete Intersection over Union）损失，从而提高对小目标和细长目标的检测效率；最后，制作玻璃缺陷检测数据集验证模型性能。实验结果表明，相较于YOLOv8n，YOLO-WANI在玻璃缺陷检测数据集上的mAP50：95提高了1.9个百分点、mAP50提高了4.6个百分点，分别达到了42.6%、81.7%；在NEU-DET（the NorthEastern University surface defect database for defect DETection task）钢材缺陷检测数据集上mAP50：95提高了1.5个百分点、mAP50提高了1.9个百分点，分别达到了40.3%、76.1%。所提模型和各个量级的实时缺陷检测模型相比都有着最高的精度，同时只有4.1×10⁶的参数量和9.9 GFLOPs的计算量，且FPS（Frames Per Second）达到138、单图推理时间为（7.16±0.17） ms，满足轻量化和高精度的需求。

关键词: 缺陷检测, 多尺度特征融合, 注意力机制, 边界框回归, 目标检测

CLC Number:

TP391.4

Ziyuan ZHOU, Miao CHENG, Lian HE, Jiacheng ZHANG. Small and elongated object detection model based on improved YOLOv8[J]. Journal of Computer Applications, 0, (): 286-295.

周子渊, 成苗, 何莲, 张佳成. 基于改进YOLOv8的小目标与细长目标检测模型[J]. 《计算机应用》唯一官方网站, 0, (): 286-295.

Figures/Tables 19

References 50

1	曹家乐，李亚利，孙汉卿，等. 基于深度学习的视觉目标检测技术综述［J］. 中国图象图形学报， 2022， 27（6）： 1697-1722.
2	DAI J， QI H， XIONG Y， et al. Deformable convolutional networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 764-773.
3	QI Y， HE Y， QI X， et al. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 6047-6056.
4	LI J， LIANG X， WEI Y， et al. Perceptual generative adversarial networks for small object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1951-1959.
5	BAI Y， ZHANG Y， DING M， et al. SOD-MTGAN： small object detection via multi-task generative adversarial network ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11217. Cham： Springer， 2018： 210-226.
6	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944.
7	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768.
8	GHIASI G， LIN T Y， LE Q V. NAS-FPN： learning scalable feature pyramid architecture for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7029-7038.
9	TAN M， PANG R， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 10778-10787.
10	ZHAO G， GE W， YU Y. GraphFPN： graph feature pyramid network for object detection［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 2743-2752.
11	YANG G， LEI J， ZHU Z， et al. AFPN： asymptotic feature pyramid network for object detection［C］// Proceedings of the 2023 IEEE International Conference on Systems， Man， and Cybernetics. Piscataway： IEEE， 2023： 2184-2189.
12	WANG J， XU C， YANG W， et al. A normalized Gaussian Wasserstein distance for tiny object detection［EB/OL］. ［2023-06-14］..
13	ZHANG H， XU C， ZHANG S. Inner-IoU： more effective intersection over union loss with auxiliary bounding box［EB/OL］. ［2023-12-14］..
14	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916.
15	LIU S， HUANG D， WANG Y. Learning spatial fusion for single-shot object detection［EB/OL］. ［2023-11-25］..
16	ZHAO Y， LV W， XU S， et al. DETRs beat YOLOs on real-time object detection［EB/OL］. ［2023-08-06］..
17	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141.
18	HOU Q， ZHOU D， FENG J. Coordinate attention for efficient mobile network design ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13708-13717.
19	PARK J， WOO S， LEE J Y， et al. BAM： bottleneck attention module［C］// Proceedings of the 2018 British Machine Vision Conference. Durham： BMVA Press， 2018： No.92.
20	WOO S， PARK J， LEE J Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
21	LAU K W， PO L M， REHMAN Y A UR. Large separable kernel attention： rethinking the large kernel attention design in CNN［J］. Expert Systems with Applications， 2024， 236： No.121352.
22	ZHU X， SU W， LU L， et al. Deformable DETR： deformable transformers for end-to-end object detection ［EB/OL］. ［2023-03-18］..
23	ZHU L， WANG X， KE Z， et al. BiFormer： vision Transformer with bi-level routing attention［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 10323-10333.
24	OUYANG D， HE S， ZHANG G， et al. Efficient multi-scale attention module with cross-spatial learning ［C］// Proceedings of the 2023 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2023： 1-5.
25	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525.
26	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-04-23］..
27	MSEDDI W S， GHALI R， JMAL M， et al. Fire detection and segmentation using YOLOv5 and U-Net ［C］// Proceedings of the 29th European Signal Processing Conference. Piscataway： IEEE， 2021： 741-745.
28	CHEN Q， WANG Y， YANG T， et al. You only look one-level feature［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13034-13043.
29	WANG C Y， BOCHKOVSKIY A， LIAO H Y M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 7464-7475.
30	HE K， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2980-2988.
31	ZHANG H， CHANG H， MA B， et al. Dynamic R-CNN： towards high quality object detection via dynamic training ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12360. Cham： Springer， 2020： 260-275.
32	CARION N， MASSA F， SYNNAEVE G， et al. End-to-end object detection with Transformers［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12346. Cham： Springer， 2020： 213-229.
33	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
34	YU J， JIANG Y， WANG Z， et al. UnitBox： an advanced object detection network［C］// Proceedings of the 24th ACM International Conference on Multimedia. New York： ACM， 2016： 516-520.
35	REZATOFIGHI H， TSOI N， GWAK J， et al. Generalized intersection over union： a metric and a loss for bounding box regression［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 658-666.
36	ZHENG Z， WANG P， LIU W， et al. Distance-IoU loss： faster and better learning for bounding box regression ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2020： 12993-13000.
37	ZHENG Z， WANG P， REN D， et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation ［J］. IEEE Transactions on Cybernetics， 2022， 52（8）： 8574-8586.
38	ZHANG Y F， REN W， ZHANG Z， et al. Focal and efficient IoU loss for accurate bounding box regression［J］. Neurocomputing， 2022， 506： 146-157.
39	GEVORGYAN Z. SIoU loss： more powerful learning for bounding box regression［EB/OL］. ［2023-05-25］..
40	TONG Z， CHEN Y， XU Z， et al. Wise-IoU： bounding box regression loss with dynamic focusing mechanism［EB/OL］. ［2023-04-08］..
41	MA S， XU Y. MDPIoU： a loss for efficient and accurate bounding box regression［EB/OL］. ［2023-09-14］..
42	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 39（6）： 1137-1149.
43	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot MultiBox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
44	CAI Z， VASCONCELOS N. Cascade R-CNN： delving into high quality object detection ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6154-6162.
45	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
46	TIAN Z， SHEN C， CHEN H， et al. FCOS： fully convolutional one-stage object detection［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9626-9635.
47	LI H， LI J， WEI H， et al. Slim-Neck by GSConv： a better design paradigm of detector architectures for autonomous vehicles［EB/OL］. ［2023-08-17］..
48	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626.
49	LI C， LI L， JIANG H， et al. YOLOv6： a single-stage object detection framework for industrial applications ［EB/OL］. ［2023-08-07］..
50	HE Y， SONG K， MENG Q， et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features ［J］. IEEE Transactions on Instrumentation and Measurement， 2020， 69（4）： 1493-1504.

颈部结构	mAP50：95/ %	mAP50/ %	参数量/10⁶	计算量/GFLOPs
PAN	40.7	77.1	3.0	8.1
BiFPN	40.5	76.6	3.1	8.3
AFPN	40.9	78.1	3.4	8.7
Smallod	40.9	77.5	3.1	12.2
Slimneck	40.0	75.8	2.8	7.3
WPAN（本文）	42.1	80.9	4.1	9.7

颈部结构	mAP50：95/ %	mAP50/ %	参数量/10⁶	计算量/GFLOPs
PAN	40.7	77.1	3.0	8.1
BiFPN	40.5	76.6	3.1	8.3
AFPN	40.9	78.1	3.4	8.7
Smallod	40.9	77.5	3.1	12.2
Slimneck	40.0	75.8	2.8	7.3
WPAN（本文）	42.1	80.9	4.1	9.7

特征交互模块	mAP50：95/ %	mAP50/ %	参数量/10⁶	计算量/GFLOPs
SPPF	40.7	77.1	3.0	8.1
SPPFCSP	40.6	75.5	4.6	9.4
SimCSPSPPF	40.7	78.8	3.4	8.4
SE	41.1	78.1	3.0	8.1
CA	40.3	79.3	3.0	8.1
BAM	41.2	79.2	3.0	8.1
CBAM	40.8	79.6	3.0	8.1
Biformer	41.6	78.6	3.3	62.4
LSKA	40.6	78.1	3.0	8.3
AMFI（本文）	41.3	80.1	3.0	8.3

特征交互模块	mAP50：95/ %	mAP50/ %	参数量/10⁶	计算量/GFLOPs
SPPF	40.7	77.1	3.0	8.1
SPPFCSP	40.6	75.5	4.6	9.4
SimCSPSPPF	40.7	78.8	3.4	8.4
SE	41.1	78.1	3.0	8.1
CA	40.3	79.3	3.0	8.1
BAM	41.2	79.2	3.0	8.1
CBAM	40.8	79.6	3.0	8.1
Biformer	41.6	78.6	3.3	62.4
LSKA	40.6	78.1	3.0	8.3
AMFI（本文）	41.3	80.1	3.0	8.3

边界框回归损失函数	mAP50：95/ %	mAP50/ %	参数量/10⁶	计算量/GFLOPs
CIoU	40.7	77.1	3.0	8.1
EIoU	39.3	77.7	3.0	8.1
SIoU	40.3	77.8	3.0	8.1
MPDIoU	40.1	76.9	3.0	8.1
Wise-IoU	40.9	78.1	3.0	8.1
NWD	41.0	77.0	3.0	8.1
NWD&Inner-CIoU （本文）	41.1	78.6	3.0	8.1