《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2727-2734.DOI: 10.11772/j.issn.1001-9081.2022081249

• 人工智能 • 上一篇    下一篇

基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法

杨昊, 张轶()   

  1. 四川大学 计算机学院,成都 610065
  • 收稿日期:2022-08-23 修回日期:2022-10-22 接受日期:2022-11-03 发布日期:2023-01-11 出版日期:2023-09-10
  • 通讯作者: 张轶
  • 作者简介:杨昊(1999—),男,四川雅安人,硕士研究生,主要研究方向:计算机视觉、目标检测;
  • 基金资助:
    国家自然科学基金资助项目(U20A20161)

Feature pyramid network algorithm based on context information and multi-scale fusion importance awareness

Hao YANG, Yi ZHANG()   

  1. College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
  • Received:2022-08-23 Revised:2022-10-22 Accepted:2022-11-03 Online:2023-01-11 Published:2023-09-10
  • Contact: Yi ZHANG
  • About author:YANG Hao, born in 1999, M. S. candidate. His research interests include computer vision, object detection.
  • Supported by:
    National Natural Science Foundation of China(U20A20161)

摘要:

针对目标检测中分类和定位子任务分别需要大感受野和高分辨率,难以在这两个相互矛盾的需求间取得平衡的问题,提出一种用于目标检测的基于注意力机制的特征金字塔网络算法。该算法能整合多个不同感受野来获取更丰富的语义信息,以一种更关注不同特征图重要性的方式融合多尺度特征图,并在注意力机制引导下进一步精练复杂融合后的特征图。首先,通过多尺度的空洞卷积获取多尺度感受野,在保留分辨率的同时增强语义信息;其次,通过多级特征融合(MLF)方式将多个不同尺度的特征图通过上采样或池化操作变为相同分辨率后融合;最后,利用注意力引导的特征精练模块(AFRM)对融合后的特征图作精练处理,丰富语义信息并消除融合带来的混叠效应。将所提特征金字塔替换Faster R-CNN中的特征金字塔网络(FPN)后在MS COCO 2017数据集上进行实验,结果表明当骨干网络为深度50和101的残差网络(ResNet)时,平均精度(AP)分别达到了39.2%和41.0%,与使用原FPN的Faster R-CNN相比,分别提高了1.4和1.0个百分点。可见,所提特征金字塔网络算法能替代原FPN,更好地应用在目标检测场景中。

关键词: 特征金字塔, 目标检测, 上下文信息, 多尺度特征融合, 注意力机制

Abstract:

Aiming at the problem that the classification and localization sub-tasks in object detection require large receptive field and high resolution respectively, and it is difficult to achieve a balance between these two contradictory requirements, a feature pyramid network algorithm based on attention mechanism for object detection was proposed. In the algorithm, multiple different receptive fields were integrated to obtain richer semantic information, multi-scale feature maps were fused in the way of paying more attention to the importance of different feature maps, and the fused feature maps were further refined under the guidance of the attention mechanism. Firstly, multi-scale receptive fields were obtained through multiple atrous convolutions with different dilation rates, which enhanced the semantic information with the preservation of the resolution. Secondly, through the Multi-Level Fusion (MLF), multiple feature maps of different scales were fused after changing to the same resolution through upsampling or pooling operations. Finally, the proposed Attention-guided Feature Refinement Module (AFRM) was used to refine the fused feature maps to enhance semantic information and eliminate the aliasing effect caused by fusion. After replacing the Feature Pyramid Network (FPN) in Faster R-CNN with the proposed feature pyramid, experiments were performed on MS COCO 2017 dataset. The results show that when the backbone network is ResNet (Residual Network) with a depth of 50 and 101, with the use of the proposed algorithm, the Average Precision (AP) of the model reaches 39.2% and 41.0% respectively, which is 1.4 and 1.0 percentage points higher than that of Faster R-CNN using the original FPN, respectively. It can be seen that the proposed feature pyramid network algorithm can replace the original feature pyramid to be better applied in the object detection scenarios.

Key words: feature pyramid, object detection, context information, multi-scale feature fusion, attention mechanism

中图分类号: