《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 365-374.DOI: 10.11772/j.issn.1001-9081.2021020230

• 人工智能 • 上一篇    

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

李亚鸣1,2, 邢凯1,2(), 邓洪武1,2, 王志勇1,2, 胡璇1,2   

  1. 1.中国科学技术大学 计算机科学与技术学院, 合肥 230027
    2.中国科学技术大学 苏州高等研究院, 江苏 苏州 215123
  • 收稿日期:2021-02-07 修回日期:2021-03-18 接受日期:2021-03-26 发布日期:2021-04-07 出版日期:2022-02-10
  • 通讯作者: 邢凯
  • 作者简介:李亚鸣(1996—),男,江西赣州人,硕士研究生,主要研究方向:深度学习;
    邢凯(1981—),男,江苏苏州人,副教授,博士,CCF会员,主要研究方向:操作系统、数据挖掘;
    邓洪武(1996—),男,安徽安庆人,硕士研究生,主要研究方向:深度学习;
    王志勇(1996—),男,河南商丘人,硕士研究生,主要研究方向:深度学习;
    胡璇(1995—),女,安徽池州人,硕士研究生,主要研究方向:深度学习。

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

Yaming LI1,2, Kai XING1,2(), Hongwu DENG1,2, Zhiyong WANG1,2, Xuan HU1,2   

  1. 1.School of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230027,China
    2.Suzhou Institute for Advanced Research,University of Science and Technology of China,Suzhou Jiangsu 215123,China
  • Received:2021-02-07 Revised:2021-03-18 Accepted:2021-03-26 Online:2021-04-07 Published:2022-02-10
  • Contact: Kai XING
  • About author:LI Yaming, born in 1996, M. S. candidate. His research interests include deep learning.
    XING Kai, born in 1981, Ph. D., associate professor. His research interests include operating system, data mining.
    DENG Hongwu, born in 1996, M. S. candidate. His research interests include deep learning.
    WANG Zhiyong, born in 1996, M. S. candidate. His research interests include deep learning.
    HU Xuan, born in 1995, M. S. candidate. Her research interests include deep learning.

摘要:

针对卷积结构的深度学习模型在小样本学习场景中泛化性能较差的问题,以AlexNet和ResNet为例,提出一种基于小样本无梯度学习的卷积结构预训练模型的性能优化方法。首先基于因果干预对样本数据进行调制,由非时序数据生成序列数据,并基于协整检验从数据分布平稳性的角度对预训练模型进行定向修剪;然后基于资本资产定价模型(CAPM)以及最优传输理论,在预训练模型中间输出过程中进行无需梯度传播的正向学习并构建一种全新的结构,从而生成在分布空间中具有明确类间区分性的表征向量;最后基于自注意力机制对生成的有效特征进行自适应加权处理,并在全连接层对特征进行聚合,从而生成具有弱相关性的embedding向量。实验结果表明所提出的方法能够使AlexNet和ResNet卷积结构预训练模型在ImageNet 2012数据集的100类图片上的Top-1准确率分别从58.82%、78.51%提升到68.50%、85.72%,可见所提方法能够基于小样本训练数据有效提高卷积结构预训练模型的性能。

关键词: 资本资产定价模型, Wasserstein距离, 无梯度学习, 自注意力机制, 预训练模型

Abstract:

Deep learning model with convolution structure has poor generalization performance in few-shot learning scenarios. Therefore, with AlexNet and ResNet as examples, a derivative-free few-shot learning based performance optimization method of convolution structured pre-trained models was proposed. Firstly, the sample data were modulated to generate the series data from the non-series data based on causal intervention, and the pre-trained model was pruned directly based on the co-integration test from the perspective of data distribution stability. Then, based on Capital Asset Pricing Model (CAPM) and optimal transmission theory, in the intermediate output process of the pre-trained model, the forward learning without gradient propagation was carried out, and a new structure was constructed, thereby generating the representation vectors with clear inter-class distinguishability in the distribution space. Finally, the generated effective features were adaptively weighted based on the self-attention mechanism, and the features were aggregated in the fully connected layer to generate the embedding vectors with weak correlation. Experimental results indicate that the proposed method can increase the Top-1 accuracies of the AlexNet and ResNet convolution structured pre-trained models on 100 classes of images in ImageNet 2012 dataset from 58.82%, 78.51% to 68.50%, 85.72%, respectively. Therefore, the proposed method can effectively improve the performance of convolution structured pre-trained models based on few-shot training data.

Key words: Capital Asset Pricing Model (CAPM), Wasserstein distance, derivative-free learning, self-attention mechanism, pre-trained model

中图分类号: