Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (9): 2993-3002.DOI: 10.11772/j.issn.1001-9081.2024081227

• Multimedia computing and computer simulation • Previous Articles    

Contextual semantic representation and pixel relationship correction for few-shot object detection

Lili WEI1, Lirong YAN1, Xiaofen TANG1,2()   

  1. 1.School of Information Engineering,Ningxia University,Yinchuan Ningxia 750021,China
    2.Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West (Ningxia University),Yinchuan Ningxia 750021,China
  • Received:2024-09-02 Revised:2024-09-30 Accepted:2024-11-04 Online:2024-11-19 Published:2025-09-10
  • Contact: Xiaofen TANG
  • About author:WEI Lili, born in 1996, M. S. candidate. Her research interests include few-shot object detection.
    YAN Lirong, born in 1999, M. S. candidate. Her research interests include few-shot object detection.
  • Supported by:
    National Natural Science Foundation of China(61966029)

上下文语义表征和像素关系纠正的小样本目标检测

魏利利1, 闫丽蓉1, 唐晓芬1,2()   

  1. 1.宁夏大学 信息工程学院,银川 750021
    2.宁夏“东数西算”人工智能与信息安全重点实验室(宁夏大学),银川 750021
  • 通讯作者: 唐晓芬
  • 作者简介:魏利利(1996—),女,山东济南人,硕士研究生,CCF会员,主要研究方向:小样本目标检测
    闫丽蓉(1999—),女,宁夏中卫人,硕士研究生,CCF会员,主要研究方向:小样本目标检测
  • 基金资助:
    国家自然科学基金资助项目(61966029)

Abstract:

In few-shot object detection, as supporting samples are scarce, and the available class information is insufficient, it is particularly important to utilize feature information of limited samples effectively. By enriching usable semantic information in both supporting and query samples, a more comprehensive matching of information between query features and supporting features can be achieved. This is helpful for the model to understand target class in few-shot scenarios, thereby achieving object detection task effectively. Therefore, a model based on spatial context and pixel relationship was proposed. The spatial context module was designed to assist the pixels in constructing a local context region, thereby obtaining semantics of pixels in the region for the center pixel, and enriching the image feature information. In addition, to address the problem that spatial context introduces noisy information easily, the pixel context relationship module was designed to utilize original feature knowledge in the image to explore relationship between pixels and construct intra- and inter-class relationship maps, so as to correct the defect that spatial context module introduces noisy information easily. Experimental results demonstrate that when PASCAL VOC datasets is divided in three ways, the proposed model has the Average Precision (AP50) improved by 2.7, 2.0, and 1.3 percentage points, respectively, under 1-shot setting where samples are extremely sparse, compared to VFA (Variational Feature Aggregation); on MS COCO dataset, under 10-shot and 30-shot settings, the proposed model has the AP improved by 0.4 and 0.6 percentage points, respectively, compared to VFA, and the AP50 improved by 11.4 and 8.7 percentage points, respectively, compared to Meta FR-CNN (Meta Faster R-CNN). It can be seen that the proposed method improves the model’s ability to recognize new classes of samples by using limited feature information more effectively, which has reference value for improving generalization ability of the models in special scenarios where only a very small number of samples can be obtained.

Key words: few-shot object detection, rich semantic information, feature information matching, spatial context, pixel relationship mapping

摘要:

小样本目标检测中因为支持样本稀缺且可利用的类别信息不足,所以有效利用有限样本的特征信息尤为重要。通过丰富支持样本和查询样本中可用的语义信息,能够实现查询特征和支持特征更全面的信息匹配,这有助于模型在小样本场景下理解目标类别,进而有效地实现目标检测。因此,提出一种基于空间上下文和像素关系的模型。设计空间上下文学习模块以辅助像素构建局部上下文区域,从而为中心像素获取区域内的像素语义,并丰富图像的特征信息。此外,针对空间上下文容易引入噪声信息的问题,设计像素上下文关系模块以利用图像中的原始特征知识探索像素之间的关系,并构建类内和类间关系映射图,从而纠正空间上下文学习模块容易引入噪声信息的缺陷。实验结果表明,在PASCAL VOC数据集上进行3种划分时,与VFA(Variational Feature Aggregation)相比,所提模型在样本极其稀缺的1-shot设置下的平均精度(AP50)分别提升2.7、2.0和1.3个百分点。在MS COCO数据集上的10-shot和30-shot设置下,与VFA相比,所提模型的AP分别提升0.4和0.6个百分点;与Meta FR-CNN(Meta Faster R-CNN)相比,所提模型的AP50分别提升11.4和8.7个百分点。可见,所提方法通过更有效地利用有限特征信息提升了对新类样本的识别能力,对只能获取极少量样本的特殊场景下的目标检测模型泛化能力的提升具有参考价值。

关键词: 小样本目标检测, 丰富语义信息, 特征信息匹配, 空间上下文, 像素关系映射

CLC Number: