《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1437-1444.DOI: 10.11772/j.issn.1001-9081.2023050699

• 2023年中国计算机学会人工智能会议(CCFAI 2023) • 上一篇    

融合多尺度和注意力机制的小样本目标检测

李鸿天1, 史鑫昊1, 潘卫国1(), 徐成1, 徐冰心1, 袁家政1,2   

  1. 1.北京市信息服务工程重点实验室(北京联合大学),北京 100101
    2.北京开放大学 科技学院,北京 100081
  • 收稿日期:2023-05-08 修回日期:2023-06-11 接受日期:2023-06-16 发布日期:2023-08-01 出版日期:2024-05-10
  • 通讯作者: 潘卫国
  • 作者简介:李鸿天(1998—),男,广东肇庆人,硕士研究生,主要研究方向:图像处理、计算机视觉
    史鑫昊(1999—),男,山东日照人,硕士研究生,主要研究方向:强化学习、计算机视觉
    徐成(1988—),男,内蒙古乌海人,讲师,博士,主要研究方向:认知计算、车联网安全
    徐冰心(1985—),女,吉林吉林人,副教授,博士,CCF会员,主要研究方向:图像处理、计算机视觉
    袁家政(1971—),男,湖南邵阳人,教授,博士,主要研究方向:人工智能、视觉计算。
    第一联系人:潘卫国(1984—),男,河北邯郸人,副教授,博士,主要研究方向:计算机视觉、智能驾驶
  • 基金资助:
    北京市自然科学基金资助项目(4232026);国家自然科学基金资助项目(62171042);北京市重点科技项目(KZ202211417048);北京市属高等学校高水平科研创新团队项目(BPHR20220120);北京市朝阳区协同创新中心资助项目(CYX2203);北京联合大学科研项目(ZK10202202)

Few-shot object detection via fusing multi-scale and attention mechanism

Hongtian LI1, Xinhao SHI1, Weiguo PAN1(), Cheng XU1, Bingxin XU1, Jiazheng YUAN1,2   

  1. 1.Beijing Key Laboratory of Information Service Engineering (Beijing Union University),Beijing 100101,China
    2.College of Science and Technology,Beijing Open University,Beijing 100081,China
  • Received:2023-05-08 Revised:2023-06-11 Accepted:2023-06-16 Online:2023-08-01 Published:2024-05-10
  • Contact: Weiguo PAN
  • About author:LI Hongtian, born in 1998, M. S. candidate. His research interests include image processing, computer vision.
    SHI Xinhao, born in 1999, M. S. candidate. His research interests include reinforcement learning, computer vision.
    XU Cheng, born in 1988, Ph. D., lecturer. His research interests include cognitive computing, vehicles of Internet security.
    XU Bingxin, born in 1985, Ph. D., associate professor. Her research interests include image processing, computer vision.
    YUAN Jiazheng, born in 1971, Ph. D., professor, His research interests include artificial intelligence, visual computing.
  • Supported by:
    Beijing Natural Sciences Foundation(4232026);National Natural Science Foundation of China(62171042);Beijing Key Science and Technology Project(KZ202211417048);High-level Research and Innovation Team Project of Beijing Higher Education Institutions(BPHR20220120);Collaborative Innovation Center(CYX2203);Beijing Union University Research Project(ZK10202202)

摘要:

现有基于微调的二阶段小样本目标检测方法对新类特征不敏感,易将新类别误判成与它相似度高的基类,影响模型的检测性能。针对上述问题,提出一种融合多尺度和注意力机制的小样本目标检测(MA-FSOD)算法。首先在骨干网络使用分组卷积和大卷积核提取更具类别区分性的特征,并加入卷积注意力模块(CBAM)实现特征的自适应增强;再通过改进的金字塔网络实现多尺度的特征融合,使候选框生成网络(RPN)可以准确找到感兴趣区域(RoI),从多个尺度向分类头提供更丰富的高质量正样本;最后在微调阶段采用余弦分类头进行分类,降低类内方差。在PASCAL-VOC 2007/2012数据集上与基于候选框编码对比损失的小样本目标检测(FSCE)算法相比,MA-FSOD算法对新类的AP50提升了5.6个百分点;在更具挑战性的MSCOCO数据集中,与Meta-Faster-RCNN相比,10-shot和30-shot对应的AP则分别提升了0.1个百分点和1.6个百分点。实验结果表明,相较于一些主流的小样本目标检测算法,MA?FSOD算法能更有效地缓解误分类问题,实现更高精度的小样本目标检测。

关键词: 迁移学习, 小样本目标检测, 注意力机制, 多尺度特征融合, 余弦相似度

Abstract:

The existing two-stage few-shot object detection methods based on fine-tuning are not sensitive to the features of new classes, which will cause misjudgment of new classes into base classes with high similarity to them, thus affecting the detection performance of the model. To address the above issue, a few-shot object detection algorithm that incorporates multi-scale and attention mechanism was proposed, namely MA-FSOD (Few-Shot Object Detection via fusing Multi-scale and Attention mechanism). Firstly, grouped convolutions and large convolution kernels were used to extract more class-discriminative features in the backbone network, and Convolutional Block Attention Module (CBAM) was added to achieve adaptive feature augmentation. Then, a modified pyramid network was used to achieve multi-scale feature fusion, which enables Region Proposal Network (RPN) to accurately find Regions of Interest (RoI) and provide more abundant high-quality positive samples from multiple scales to the classification head. Finally, the cosine classification head was used for classification in the fine-tuning stage to reduce the intra-class variance. Compared with the Few-Shot object detection via Contrastive proposal Encoding (FSCE) algorithm on PASCAL-VOC 2007/2012 dataset, the MA-FSOD algorithm improved AP50 for new classes by 5.6 percentage points; and on the more challenging MSCOCO dataset, compared with Meta-Faster-RCNN, the APs corresponding to 10-shot and 30-shot were improved by 0.1 percentage points and 1.6 percentage points, respectively. Experimental results show that MA-FSOD can more effectively alleviate the misclassification problem and achieve higher accuracy in few-shot object detection than some mainstream few-shot object detection algorithms.

Key words: transfer learning, few-shot object detection, attention mechanism, multi-scale feature fusion, cosine similarity

中图分类号: