Journal of Computer Applications ›› 0, Vol. ›› Issue (): 286-295.DOI: 10.11772/j.issn.1001-9081.2023121749

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Small and elongated object detection model based on improved YOLOv8

Ziyuan ZHOU1,2, Miao CHENG1,2,3(), Lian HE1,2,3, Jiacheng ZHANG3   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610213,China
    2.School of Computer Science and Technology,University of Chinese Academy of Sciences,Beijing 100049,China
    3.Shenzhen CBPM-KEXIN Banking Technology Company Limited,Shenzhen Guangdong 518206,China
  • Received:2023-12-03 Revised:2024-03-12 Accepted:2024-03-14 Online:2025-01-24 Published:2024-12-31
  • Contact: Miao CHENG

基于改进YOLOv8的小目标与细长目标检测模型

周子渊1,2, 成苗1,2,3(), 何莲1,2,3, 张佳成3   

  1. 1.中国科学院 成都计算机应用研究所,成都 610213
    2.中国科学院大学 计算机科学与技术学院,北京 100049
    3.深圳市中钞科信金融科技有限公司,广东 深圳 518206
  • 通讯作者: 成苗
  • 作者简介:周子渊(2000—),男,四川成都人,硕士研究生,主要研究方向:人工智能、机器视觉
    成苗(1983—),男,四川成都人,高级工程师,硕士,主要研究方向:人工智能、机器视觉
    何莲(1983—),女,四川西充人,高级工程师,博士,主要研究方向:人工智能、机器视觉
    张佳成(1985—),男,湖南常宁人,工程师,硕士,主要研究方向:人工智能、机器视觉。

Abstract:

Real-time and accurate detection of glass defects is crucial. However, the task is highly challenging due to the variably scaled morphologies of the defects as well as both small and extreme aspect ratio based elongated objects with weak features. To address the requirements, small and elongated object detection model based on improved YOLOv8 (You Only Look Once version 8) was proposed, named YOLO-WANI (WPAN+AMFI+NWD&Inner-CIoU). Firstly, Weighted Path Aggregation Network (WPAN) was designed to reduce the loss of information on small and elongated object during network propagation and balance the importance of information with different scales. Then, Attention-based Multi-scale Feature Interaction module (AMFI) was introduced to capture semantic information focusing on objects in deep features. After that, Normalization Wasserstein Distance (NWD) and Inner-CIoU loss were employed to replace the original CIoU (Complete Intersection over Union) for detection efficiency improvement of small and elongated objects. Finally, the glass defect detection dataset was created to validate the model performance. Experimental results show that compared to YOLOv8n, YOLO-WANI has improvements of 1.9 percentage points in mAP50:95 and 4.6 percentage points in mAP50 on the created glass defect detection dataset, reaching 42.6% and 81.7%, respectively; on the steel defect detection dataset NEU-DET (the NorthEastern University surface defect database for defect DETection task), YOLO-WANI has improvements of 1.5 percentage points in mAP50:95 and 1.9 percentage points in mAP50, reaching 40.3% and 76.1%, respectively. The proposed model outperforms real-time defect detection models at various orders on precision with only 4.1 million parameters and 9.9 GFLOPs computational cost, as well as Frames Per Second (FPS) of 138 and single-image inference time of (7.16±0.17) ms, meeting the requirements for lightweight and high-precision.

Key words: defect detection, multi-scale feature fusion, attention mechanism, bounding box regression, object detection

摘要:

实时、准确的玻璃缺陷检测至关重要;然而,尺度多变的缺陷形态以及特征微弱的小目标和长宽比例极端的细长目标让这个任务极具挑战性。针对上述需求,提出一种基于改进YOLOv8(You Only Look Once version 8)的小目标与细长目标检测模型YOLO-WANI(WPAN+AMFI+NWD&Inner-CIoU)。首先,设计WPAN(Weighted Path Aggregation Network)减小小目标和细长目标信息在网络传播过程中发生的损失,从而平衡不同尺度信息的重要性;其次,引入基于注意力的多尺度特征交互模块(AMFI),以捕捉深层特征中聚焦对象的语义信息;再次,使用归一化沃瑟斯坦距离(NWD)和Inner-CIoU损失替换原始的CIoU(Complete Intersection over Union)损失,从而提高对小目标和细长目标的检测效率;最后,制作玻璃缺陷检测数据集验证模型性能。实验结果表明,相较于YOLOv8n,YOLO-WANI在玻璃缺陷检测数据集上的mAP50:95提高了1.9个百分点、mAP50提高了4.6个百分点,分别达到了42.6%、81.7%;在NEU-DET(the NorthEastern University surface defect database for defect DETection task)钢材缺陷检测数据集上mAP50:95提高了1.5个百分点、mAP50提高了1.9个百分点,分别达到了40.3%、76.1%。所提模型和各个量级的实时缺陷检测模型相比都有着最高的精度,同时只有4.1×106的参数量和9.9 GFLOPs的计算量,且FPS(Frames Per Second)达到138、单图推理时间为(7.16±0.17) ms,满足轻量化和高精度的需求。

关键词: 缺陷检测, 多尺度特征融合, 注意力机制, 边界框回归, 目标检测

CLC Number: