Journal of Computer Applications

    Next Articles

Optimization model for small object detection based on multi-level feature bidirectional fusion

PAN Yexin 1,2, YANG Zhe 1,2   

  1. 1. Department of Computer Science and Technology, Soochow University 2. Jiangsu Provincial Key Laboratory for Computer Information Processing Technology (Soochow University)
  • Received:2023-09-15 Revised:2023-11-20 Online:2024-03-15 Published:2024-03-15
  • Contact: YANG Zhe
  • About author:PAN Yexin, born in 1999, M. S. candidate. His research interests include computer vision, deep learning. YANG Zhe, born in 1978, Ph. D., associate professor. His research interests include artificial intelligence, big data.
  • Supported by:
    National Natural Science Foundation of China (62002253), Ministry of Education's Collaborative Education Program on Industry and Education (220606363154256), National College Student Innovation and Entrepreneurship Training Program Project (202210285042Z)

基于多级特征双向融合的小目标检测优化模型

潘烨新1,2杨哲1,2   

  1. 1.苏州大学 计算机科学与技术学院 2.江苏省计算机信息处理技术重点实验室(苏州大学)
  • 通讯作者: 杨哲
  • 作者简介:潘烨新(1999—),男,江苏苏州人,硕士研究生,CCF会员(J1626G),主要研究方向:计算机视觉、深度学习;杨哲(1978—),男,江苏苏州人,副教授,博士,主要研究方向:人工智能、大数据。
  • 基金资助:
    国家自然科学基金资助项目(62002253);教育部产学合作协同育人项目(220606363154256);国家级大学生创新创业训练计划项目(202210285042Z)

Abstract: Due to objective factors such as small inherent features and the depth of the network causing feature loss, the detection of small targets was always a challenging issue in the field of object detection. In response to the above issues, a model for optimizing the detection of small targets was proposed based on multiple feature enhancements using the network structure. Firstly, the optimization of gradient calculation was achieved by replacing spatial pyramid pooling in the backbone network. Secondly, a multi-scale bidirectional fusion at the feature level and the addition of adaptive fusion to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental data shows that on the COCO 2017-val dataset, when the IoU (Intersection over Union) is 0.5, the average accuracy of the model reaches 61.4%, which is 4.7 percentage points higher than the currently popular YOLOv7 model. At the same time, the detection frame rate of the model on a single GPU is 78.2 frame/s, which is in line with industrial level detection speed. The experimental results show that the proposed model has achieved the effect of improving the accuracy of small object detection.

Key words: deep learning, small target, object detection, computer vision, feature fusion

摘要: 由于自身特征较小以及网络的深度造成特征丢失等客观原因,小目标的检测一直是目标检测领域的难点问题。针对以上问题,提出了基于网络结构进行多次特征增强来优化小目标检测的模型。首先,通过对主干网络中的空间金字塔池化进行替换来优化梯度计算;其次,对网络颈部实行区分特征级别的多尺度双向融合以及对输出头添加自适应融合的手段来实现多级的特征增强。实验数据显示,该模型在COCO 2017-val数据集上,交并比(IoU)为0.5时,平均精度均值达到61.4%,与目前较为流行的YOLOv7模型相比提高了4.7个百分点,同时在单GPU上模型检测帧率为78.2 frame/s,符合工业水平的检测速度。实验结果表明,所提模型达到了提升小目标检测精度的效果。

关键词: 深度学习, 小目标, 目标检测, 计算机视觉, 特征融合

CLC Number: