Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1095-1103.DOI: 10.11772/j.issn.1001-9081.2023121852

• Artificial intelligence • Previous Articles     Next Articles

Cross-domain few-shot classification model based on relation network and Vision Transformer

Yiqin YAN1, Chuan LUO1(), Tianrui LI2, Hongmei CHEN2   

  1. 1.College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
    2.School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2024-01-09 Revised:2024-03-13 Accepted:2024-03-18 Online:2024-04-28 Published:2025-04-10
  • Contact: Chuan LUO
  • About author:YAN Yiqin, born in 1998, M. S. candidate. His research interests include machine learning, computer vision.
    LUO Chuan, born in 1987, Ph. D., associate professor. His research interests include artificial intelligence, data mining, knowledge discovery.
    LI Tianrui, born in 1969, Ph. D., professor. His research interests include artificial intelligence, data mining, knowledge discovery.
    CHEN Hongmei, born in 1971, Ph. D., professor. Her research interests include data mining, granular computing.
  • Supported by:
    National Natural Science Foundation of China(62076171);Natural Science Foundation of Sichuan Province(2022NSFSC0898)

基于关系网络和Vision Transformer的跨域小样本分类模型

严一钦1, 罗川1(), 李天瑞2, 陈红梅2   

  1. 1.四川大学 计算机学院,成都 610065
    2.西南交通大学 计算机与人工智能学院,成都 611756
  • 通讯作者: 罗川
  • 作者简介:严一钦(1998—),男,四川成都人,硕士研究生,主要研究方向:机器学习、计算机视觉
    罗川(1987—),男,河南固始人,副教授,博士,主要研究方向:人工智能、数据挖掘、知识发现
    李天瑞(1969—),男,福建莆田人,教授,博士,主要研究方向:人工智能、数据挖掘、知识发现
    陈红梅(1971—),女,四川成都人,教授,博士,主要研究方向:数据挖掘、粒计算。
  • 基金资助:
    国家自然科学基金资助项目(62076171);四川省自然科学基金资助项目(2022NSFSC0898)

Abstract:

Aiming at the problem of poor classification accuracy of few-shot learning models in domain shift conditions, a cross-domain few-shot model based on relation network and ViT (Vision Transformer) — ReViT (Relation ViT) was proposed. Firstly, ViT was introduced as a feature extractor, and the pre-trained deep neural network was employed to solve the problem of insufficient feature expression ability of shallow neural network. Secondly, a shallow convolutional network was used as a task adapter to enhance the knowledge transfer ability of the model, and a non-linear classifier was constructed on the basis of the relation network and the channel attention mechanism. Thirdly, the feature extractor and the task adapter were integrated to enhance the generalization ability of the model. Finally, a four-stage learning strategy of “pre-training — meta-training — fine-tuning — meta-testing” was adopted to train the model, which further improved the cross-domain classification performance of ReViT by effective integration of transfer learning and meta learning. Experimental results using average classification accuracy as evaluation metric show that ReViT has good performance on cross-domain few-shot classification problems. Specifically, the classification accuracies of ReViT under in-domain scenarios and out-of-domain scenarios are improved by 5.82 and 1.71 percentage points, respectively, compared to the sub-optimal model on Meta-Dataset. The classification accuracies of ReViT are improved by 1.00, 1.54 and 2.43 percentage points, respectively, compared to the sub-optimal model on 5-way 5-shot for three sub-problems EuroSAT (European SATellite data), CropDisease, and ISIC (International Skin Imaging Collaboration) of BCDFSL (Broader study of Cross-Domain Few-Shot Learning) dataset. The classification accuracies of ReViT are improved by 0.13, 0.97, and 3.40 percentage points, respectively, compared to the sub-optimal model on 5-way 20-shot for EuroSAT, CropDisease, and ISIC. The classification accuracy of ReViT is improved by 0.36 percentage point compared to the sub-optimal model on 5-way 50-shot for CropDisease. It can be seen that ReViT have good classification accuracy in image classification tasks with sparse samples.

Key words: few-shot learning, relation network, cross-domain learning, meta learning, image classification

摘要:

针对小样本学习模型在数据域存在偏移时分类准确度不高的问题,提出一种基于关系网络和ViT (Vision Transformer)的跨域小样本图像分类模型ReViT (Relation ViT)。首先,引入ViT作为特征提取器,并使用经过预训练的深层神经网络解决浅层神经网络的特征表达能力不足的问题;其次,以浅层卷积网络作为任务适配器提升模型的知识迁移能力,并基于关系网络和通道注意力机制构建非线性分类器;随后,将特征提取器和任务适配器进行特征融合,从而增强模型的泛化能力;最后,采取“预训练-元学习-微调-元测试”四阶段学习策略训练模型,有效融合迁移学习与元学习,进一步提升ReViT的跨域分类性能。以平均分类准确率为评估指标的实验结果表明,ReViT在跨域小样本分类问题上有良好的性能。具体地,ReViT的分类准确度在Meta-Dataset的域内场景下和域外场景下相较于次优的模型分别提升了5.82和1.71个百分点,在BCDFSL (Broader study of Cross-Domain Few-Shot Learning)数据集的3个子问题EuroSAT(European SATellite data)、CropDisease和ISIC (International Skin Imaging Collaboration)的5-way 5-shot上相较于次优的模型分别提升了1.00、1.54和2.43个百分点,在EuroSAT、CropDisease和ISIC的5-way 20-shot上相较于次优的模型分别提升了0.13、0.97和3.40个百分点,在CropDisease的5-way 50-shot上相较于次优的模型提升了0.36个百分点。可见,ReViT能在样本量稀少的图像分类任务上保持良好的准确率。

关键词: 小样本学习, 关系网络, 跨域学习, 元学习, 图像分类

CLC Number: