《计算机应用》唯一官方网站

• •    下一篇

基于关系网络和vision Transformer的跨域小样本分类模型

严一钦1,罗川2,李天瑞3,陈红梅4   

  1. 1. 四川大学
    2. 四川大学计算机学院
    3. 西南交通大学 信息科学与技术学院,成都 610031;
    4. 西南交通大学
  • 收稿日期:2024-01-04 修回日期:2024-03-13 发布日期:2024-04-28 出版日期:2024-04-28
  • 通讯作者: 严一钦
  • 基金资助:
    国家自然科学基金项目;国家自然科学基金项目;四川省自然科学基金项目

Cross-domain few-shot classification model based on relation network and vision Transformer

严 yiqinyanChuan Luo2,LI Tian-ruiHongmei Chen   

  • Received:2024-01-04 Revised:2024-03-13 Online:2024-04-28 Published:2024-04-28
  • Contact: 严 yiqinyan

摘要: 针对小样本学习模型在数据域存在偏移时分类准确度不高的问题,提出了一种基于关系网络和vision Transformer的跨域小样本图像分类模型ReViT(Relation Vision Transformer)。首先,引入vision Transformer作为特征提取器,经过预训练的深层神经网络解决了浅层神经网络的特征表达能力不足的问题;其次,以浅层卷积网络作为任务适配器提升模型的知识迁移能力,并基于关系网络和通道注意力机制构建非线性分类器,将特征提取器和任务适配器进行特征融合,从而增强模型的泛化能力;最后,采取“预训练-元学习-微调-元测试”的四阶段学习策略训练模型,通过迁移学习与元学习的有效融合,进一步提升ReViT的跨域分类性能。实验结果表明,ReViT在面对跨域小样本分类问题上有良好的性能,在Meta-Dataset数据集的域内场景下和域外场景下的分类准确度相较于次优的模型分别提升了5.82和1.17个百分点,在BCDFSL(Broader study of Cross-Domian Few-Shot Learning)数据集的三个子问题EuroSat(European Satellite data)、CropDiease和ISIC(International Skin Imaging Collaboration)的5-way 5-shot上相较于次优的模型分别提升了1.00、1.54和2.43个百分点,在EuroSat、CropDiease和ISIC的5-way 20-shot上相较于次优的模型分别提升了0.13、0.97和3.40个百分点,在CropDiease的5-way 50-shot上相较于次优的模型提升了0.36个百分点。ReViT能在样本量稀少的图像分类任务上保持良好的准确率,在卫星图像识别,人类皮肤病识别和农作物病害识别等实际应用中能够提高系统的效率。

关键词: 关键词: 小样本学习, 关系网络, 跨域学习, 元学习, 图像分类

Abstract: Aiming at the problem of poor classification accuracy of few-shot learning models in the domain shift, a cross-domain few-shot model based on Relation network and Vision Transformer ReViT (Relation Vision Transformer) was proposed. First, vision Transformer was introduced as a feature extractor, and the pre-trained deep neural network solves the problem of insufficient feature expression ability. Then, a shallow convolutional network was used as a task adapter to enhance the knowledge transfer ability of the model, and a nonlinear classifier was constructed based on the Relation Network and the channel attention mechanism. Finally, a four-stage learning strategy of "pre-training meta-training fine-tuning meta-testing" was adopted to train the model, which further improved the cross-domain classification performance of ReViT. The experimental results show that ReViT has good performance in dealing with the cross-domain few-shot classification problem, and the classification accuracies under in-domain scenarios and out-of-domain scenarios in the Meta-Dataset dataset are improved by 5.82 and 1.17 percentage points, respectively. ReViT is improved by 1.00, 1.54 and 2.43 percentage points on the 5-way 5-shot for the three sub-problems EuroSat(European Satellite data), CropDiease and ISIC(International Skin Imaging Collaboration) of the BCDFSL(Broader study of Cross-Domian Few-Shot Learning) dataset, respectively. ReViT is improved by 0.13, 0.97, and 3.40 percentage points on the 5-way 20-shot for EuroSat, CropDiease, and ISIC, respectively. ReViT is improved by 0.36 percentage point improvement on the 5-way 50-shot for CropDiease. The good classification results show that ReViT can have applications in image classification tasks with sparse sample size, such as satellite image recognition, human skin disease recognition and crop disease recognition.

Key words: Keywords: few-shot learning, relation network, cross-domain, meta learning, image classification

中图分类号: