Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2871-2877.DOI: 10.11772/j.issn.1001-9081.2023091274

• Multimedia computing and computer simulation • Previous Articles     Next Articles

Optimization model for small object detection based on multi-level feature bidirectional fusion

Yexin PAN1,2, Zhe YANG1,2()   

  1. 1.School of Computer Science & Technology,Soochow University,Suzhou Jiangsu 215006,China
    2.Jiangsu Provincial Key Laboratory for Computer Information Processing Technology (Soochow University),Suzhou Jiangsu 215006,China
  • Received:2023-09-18 Revised:2023-11-28 Accepted:2023-12-01 Online:2024-03-15 Published:2024-09-10
  • Contact: Zhe YANG
  • About author:PAN Yexin, born in 1999, M. S. candidate. His research interests include computer vision, deep learning.
  • Supported by:
    National Natural Science Foundation of China(62002253);Collaborative Education Program on Industry and Education of Ministry of Education(220606363154256);National College Student Innovation and Entrepreneurship Training Program Project(202210285042Z)


潘烨新1,2, 杨哲1,2()   

  1. 1.苏州大学 计算机科学与技术学院,江苏 苏州 215006
    2.江苏省计算机信息处理技术重点实验室(苏州大学),江苏 苏州 215006
  • 通讯作者: 杨哲
  • 作者简介:潘烨新(1999—),男,江苏苏州人,硕士研究生,CCF会员,主要研究方向:计算机视觉、深度学习;
  • 基金资助:


Due to objective factors such as small inherent features and the depth of the network causing feature loss, the detection of small objects is always a challenging issue in the field of object detection. To address the above issues, a model for optimizing the detection of small objects was proposed based on multiple feature enhancements based on the network structure. Firstly, the optimization of gradient calculation was achieved by replacing Spatial Pyramid Pooling (SPP) in the backbone network. Secondly, a multi-level bidirectional fusion at the feature level and the addition of Adaptive Feature Fusion (AFF) module to the output head were employed in the network neck to achieve multi-level feature enhancement. Experimental results show that on COCO2017-val dataset, when the IoU (Intersection over Union) is 0.5, the average precision of the proposed model reaches 61.4%, which is 4.7 percentage points higher than that of the currently popular YOLOv7 model. At the same time, the detection frame rate of the proposed model with a single GPU is 78.2 frame/s, which is in line with industrial level detection speed.

Key words: deep learning, small object, object detection, computer vision, feature fusion


由于自身特征较小以及网络的深度造成特征丢失等客观原因,小目标的检测一直是目标检测领域的难点问题。针对以上问题,提出基于网络结构进行多次特征增强以优化小目标检测的模型。首先,替换主干网络中的空间金字塔池化(SPP)以优化梯度计算;其次,对网络颈部实行区分特征级别的多级双向融合,并对输出头添加自适应特征融合(AFF)模块,以实现多级的特征增强。实验结果表明,在COCO2017-val数据集上,当交并比(IoU)为0.5时,所提模型的平均精度均值达到61.4%,与目前较流行的YOLOv7模型相比提高了4.7个百分点,同时在单GPU上模型的检测帧率为78.2 frame/s,满足工业检测速度要求。

关键词: 深度学习, 小目标, 目标检测, 计算机视觉, 特征融合

CLC Number: