《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (8): 2580-2587.DOI: 10.11772/j.issn.1001-9081.2023081113

• 多媒体计算与计算机仿真 • 上一篇    

基于改进YOLOv5的Logo检测算法

李烨恒, 罗光圣(), 苏前敏   

  1. 上海工程技术大学,电子电气工程学院,上海 201620
  • 收稿日期:2023-08-18 修回日期:2023-10-23 接受日期:2023-11-02 发布日期:2023-12-18 出版日期:2024-08-10
  • 通讯作者: 罗光圣
  • 作者简介:李烨恒(1997—),男,湖北武汉人,硕士研究生,主要研究方向:深度学习、目标检测
    罗光圣(1982—),男,湖北黄石人,副教授,博士,主要研究方向:小样本学习、联邦学习、目标检测 luoguangsheng03@126.com
    苏前敏(1974—),男,上海人,副教授,博士,主要研究方向:深度学习、知识图谱、智慧医疗。
  • 基金资助:
    科技部科技创新2030“新一代人工智能”重大项目(2020AAA0109300)

Logo detection algorithm based on improved YOLOv5

Yeheng LI, Guangsheng LUO(), Qianmin SU   

  1. College of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
  • Received:2023-08-18 Revised:2023-10-23 Accepted:2023-11-02 Online:2023-12-18 Published:2024-08-10
  • Contact: Guangsheng LUO
  • About author:bio graphy:LI Yeheng, born in 1997, M. S. candidate. His research interests include deep learning, object detection.
    bio graphy:SU Qianmin, born in 1974, Ph. D., associate professor. His research interests include deep learning, knowledge graph, smart healthcare.
  • Supported by:
    Scientific and Technological Innovation 2030 — “New Generation Artificial Intelligence” Major Project(2020AAA0109300)

摘要:

针对Logo图像背景复杂、Logo目标尺寸多变的问题,提出了一种基于YOLOv5的改进检测算法。首先,结合CBAM(Channel Block Attention Module),分别在图像通道与空间方向进行压缩,提取图像的关键信息与重要区域;然后,使用可变空洞卷积(SAC)使网络在不同尺度下自适应地调整特征图中的感受野大小,以捕获不同尺度下的物体信息,改善网络对多尺度目标的检测效果;最后,将归一化Wasserstein距离(NWD)嵌入损失函数,将边界框建模成2D的高斯分布,计算对应的高斯分布之间的相似度,更好地度量目标之间的相似性,提高对小目标的检测性能与模型鲁棒性和稳定性。实验结果表明,在数据量较小的数据集FlickrLogos-32中,改进后算法的平均精度均值(mAP@0.5)达到90.6%,比原始YOLOv5算法提升了1个百分点;在数据量较大的数据集QMULOpenLogo中,改进后算法的mAP@0.5达到62.7%,比原始YOLOv5算法提升了2.3个百分点;在针对特定类型的Logo检测集LogoDet3K中,针对3类商标改进后算法比原始算法的mAP@0.5分别提升了1.2、1.4与1.4个百分点,说明它有更好的Logo图像小目标检测能力。

关键词: Logo检测, YOLOv5网络模型, CBAM, 小目标检测, 归一化Wasserstein距离

Abstract:

To address the challenges posed by complex background and varying size of logo images, an improved detection algorithm based on YOLOv5 was proposed. Firstly, in combination with the Channel Block Attention Module (CBAM), compression was applied in both image channels and spatial dimensions to extract critical information and significant regions within the image. Subsequently, the Switchable Atrous Convolution (SAC) was employed to allow the network to adaptively adjust the receptive field size in feature maps at different scales, improving the detection effects of objects across multiple scales. Finally, the Normalized Wasserstein Distance (NWD) was embedded into the loss function. The bounding boxes were modeled as 2D Gaussian distributions, the similarity between corresponding Gaussian distributions was calculated to better measure the similarity among objects, thereby enhancing the detection performance for small objects, and improving model robustness and stability. Compared to the original YOLOv5 algorithm: in small dataset FlickrLogos?32, the improved algorithm achieved a mean of Average Precision (mAP@0.5) of 90.6%, with an increase of 1 percentage point; in large dataset QMULOpenLogo, the improved algorithm achieved an mAP@0.5 of 62.7%, with an increase of 2.3 percentage points; in LogoDet3K for three types of logos, the improved algorithm increased the mAP@0.5 by 1.2, 1.4, and 1.4 percentage points respectively. Experimental results demonstrate that the improved algorithm has better small object detection ability of logo images.

Key words: Logo detection, YOLOv5 network model, Channel Block Attention Module (CBAM), small object detection, Normalized Wasserstein Distance (NWD)

中图分类号: