基于小样本无梯度学习的卷积结构预训练模型性能优化方法

doi:10.11772/j.issn.1001-9081.2021020230

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 365-374.DOI: 10.11772/j.issn.1001-9081.2021020230

• 人工智能 • 上一篇

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

李亚鸣¹^,², 邢凯¹^,²(), 邓洪武¹^,², 王志勇¹^,², 胡璇¹^,²

^1.中国科学技术大学计算机科学与技术学院, 合肥 230027
^2.中国科学技术大学苏州高等研究院, 江苏苏州 215123

收稿日期:2021-02-07 修回日期:2021-03-18 接受日期:2021-03-26 发布日期:2021-04-07 出版日期:2022-02-10
通讯作者: 邢凯
作者简介:李亚鸣（1996—），男，江西赣州人，硕士研究生，主要研究方向：深度学习；
邢凯（1981—），男，江苏苏州人，副教授，博士，CCF会员，主要研究方向：操作系统、数据挖掘；
邓洪武（1996—），男，安徽安庆人，硕士研究生，主要研究方向：深度学习；
王志勇（1996—），男，河南商丘人，硕士研究生，主要研究方向：深度学习；
胡璇（1995—），女，安徽池州人，硕士研究生，主要研究方向：深度学习。

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

Yaming LI¹^,², Kai XING¹^,²(), Hongwu DENG¹^,², Zhiyong WANG¹^,², Xuan HU¹^,²

^1.School of Computer Science and Technology，University of Science and Technology of China，Hefei Anhui 230027，China
^2.Suzhou Institute for Advanced Research，University of Science and Technology of China，Suzhou Jiangsu 215123，China

Received:2021-02-07 Revised:2021-03-18 Accepted:2021-03-26 Online:2021-04-07 Published:2022-02-10
Contact: Kai XING
About author:LI Yaming， born in 1996， M. S. candidate. His research interests include deep learning.
XING Kai， born in 1981， Ph. D.， associate professor. His research interests include operating system， data mining.
DENG Hongwu， born in 1996， M. S. candidate. His research interests include deep learning.
WANG Zhiyong， born in 1996， M. S. candidate. His research interests include deep learning.
HU Xuan， born in 1995， M. S. candidate. Her research interests include deep learning.

摘要/Abstract

摘要：

针对卷积结构的深度学习模型在小样本学习场景中泛化性能较差的问题，以AlexNet和ResNet为例，提出一种基于小样本无梯度学习的卷积结构预训练模型的性能优化方法。首先基于因果干预对样本数据进行调制，由非时序数据生成序列数据，并基于协整检验从数据分布平稳性的角度对预训练模型进行定向修剪；然后基于资本资产定价模型（CAPM）以及最优传输理论，在预训练模型中间输出过程中进行无需梯度传播的正向学习并构建一种全新的结构，从而生成在分布空间中具有明确类间区分性的表征向量；最后基于自注意力机制对生成的有效特征进行自适应加权处理，并在全连接层对特征进行聚合，从而生成具有弱相关性的embedding向量。实验结果表明所提出的方法能够使AlexNet和ResNet卷积结构预训练模型在ImageNet 2012数据集的100类图片上的Top-1准确率分别从58.82%、78.51%提升到68.50%、85.72%，可见所提方法能够基于小样本训练数据有效提高卷积结构预训练模型的性能。

关键词: 资本资产定价模型, Wasserstein距离, 无梯度学习, 自注意力机制, 预训练模型

Abstract:

Deep learning model with convolution structure has poor generalization performance in few-shot learning scenarios. Therefore， with AlexNet and ResNet as examples， a derivative-free few-shot learning based performance optimization method of convolution structured pre-trained models was proposed. Firstly， the sample data were modulated to generate the series data from the non-series data based on causal intervention， and the pre-trained model was pruned directly based on the co-integration test from the perspective of data distribution stability. Then， based on Capital Asset Pricing Model （CAPM） and optimal transmission theory， in the intermediate output process of the pre-trained model， the forward learning without gradient propagation was carried out， and a new structure was constructed， thereby generating the representation vectors with clear inter-class distinguishability in the distribution space. Finally， the generated effective features were adaptively weighted based on the self-attention mechanism， and the features were aggregated in the fully connected layer to generate the embedding vectors with weak correlation. Experimental results indicate that the proposed method can increase the Top-1 accuracies of the AlexNet and ResNet convolution structured pre-trained models on 100 classes of images in ImageNet 2012 dataset from 58.82%， 78.51% to 68.50%， 85.72%， respectively. Therefore， the proposed method can effectively improve the performance of convolution structured pre-trained models based on few-shot training data.

Key words: Capital Asset Pricing Model (CAPM), Wasserstein distance, derivative-free learning, self-attention mechanism, pre-trained model

中图分类号:

TP18

李亚鸣, 邢凯, 邓洪武, 王志勇, 胡璇. 基于小样本无梯度学习的卷积结构预训练模型性能优化方法[J]. 计算机应用, 2022, 42(2): 365-374.

Yaming LI, Kai XING, Hongwu DENG, Zhiyong WANG, Xuan HU. Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure[J]. Journal of Computer Applications, 2022, 42(2): 365-374.

图/表 10

图1 资本资产定价模型

Fig. 1 Capital Asset Pricing Model

图2 样本数据增强

Fig. 2 Sample data augmentation

图3 本文方法的整体框架

Fig. 3 Overall framework of the proposed method

图4 自注意力机制结构

Fig. 4 Structure of self-attention mechanism

图5 随机类别对应的排序后的标准差分布

Fig. 5 Sorted standard deviation distribution corresponding to random class

图6 模型经过修剪后，单类别召回率普遍提升

Fig. 6 Most single class recall rates increasing after model pruning

图7 不同物种对应的单分类模型有效结构相似比例

Fig. 7 Similar ratios of effective structures of single classification models for different species

图8 随机选择类别，资本资产定价模型组合优化前后所有采样的收益R分布

Fig. 8 After randomly selecting a class， distribution of income R of all samples before and after combinational optimization of capital asset pricing model

图9 资本资产定价模型组合优化前后单采样点的可区分性

Fig. 9 Distinguishability of single node before and after combinational optimization of capital asset pricing model

表1 图片分类任务性能比较 ( %)

Tab. 1 Performance comparison of image classification tasks

网络模型	数据集	Top-1 Acc	Top-5 Acc
AlexNet	ImageNet 2012（100类）	58.82	83.51
AlexNet	CIFAR-100	61.29	81.34
AlexNet改进模型	ImageNet 2012（100类）	68.50	92.25
AlexNet改进模型	CIFAR-100	69.15	89.55
ResNet50	ImageNet 2012（100类）	78.51	94.20
ResNet50改进模型	ImageNet 2012（100类）	85.72	96.65

参考文献 45

1	XIE S N， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995. 10.1109/cvpr.2017.634
2	HUANG G， LIU Z， MAATEN L VAN DER， et al. Densely connected convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2261-2269. 10.1109/cvpr.2017.243
3	ZHANG M R， LUCAS J， HINTON G， et al. Lookahead optimizer： k steps forward， 1 step back［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. .
4	LILLICRAP T P， SANTORO A， MARRIS L， et al. Backpropagation and the brain［J］. Nature Reviews Neuroscience， 2020， 21（6）： 335-346. 10.1038/s41583-020-0277-3
5	KHAN A， SOHAIL A， ZAHOORA U， et al. A survey of the recent architectures of deep convolutional neural networks［J］. Artificial Intelligence Review， 2020， 53（8）： 5455-5516. 10.1007/s10462-020-09825-6
6	KINGMA D P， BA J L. Adam： a method for stochastic optimization［J］. ［EB/OL］. （2017-01-30）［2021-01-03］. .
7	DUCHI J， HAZAN E， SINGER Y. Adaptive sub gradient methods for online learning and stochastic optimization［J］. Journal of Machine Learning Research， 2011， 12： 2121-2159.
8	FERNÁNDEZ-REDONDO M， HERNÁNDEZ-ESPINOSA C. Weight initialization methods for multilayer feedforward ［C］// Proceedings of the 2001 European Symposium on Artificial Neural Networks. ［2021-01-22］. . 10.1109/ijcnn.2001.939011
9	SANTURKAR S， TSIPRAS D， ILYAS A， et al. How does batch normalization help optimization？［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 2488-2498.
10	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
11	BACHLECHNER T， MAJUMDER B P， MAO H H， et al. ReZero is all you need： fast convergence at large depth［EB/OL］. （2020-06-25）［2021-02-05］. .
12	RAMACHANDRAN P， ZOPH B， LE Q V. Searching for activation functions［EB/OL］. （2017-10-27）［2021-02-05］. .
13	MISRA D， LANDSKAPE. Mish： a self-regularized non-monotonic neural activation function ［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.928.
14	GAO Z T， WANG L M， WU G S. LIP： local importance-based pooling ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer. Piscataway： IEEE. 2019： 3354-3363. 10.1109/iccv.2019.00345
15	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
16	WANG Z N， XIANG C Q， ZOU W B， et al. MMA regularization： decorrelating weights of neural networks by maximizing the minimal angles［C/OL］// Proceedings of the 34th Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/lsp.2020.3037512
17	JENSEN M C， BLACK F， SCHOLES M S. The capital asset pricing model： some empirical tests［M］// JENSEN M C. Studies in the Theory of Capital Markets. New York： Praeger Publishers Inc.， 1972： 25-28.
18	LECUN Y， BOTTOU L， BENGIO Y， et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE， 1998， 86（11）： 2278-2324. 10.1109/5.726791
19	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
20	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15： 1929-1958.
21	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-01-20］. . 10.5244/c.28.6
22	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
23	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
24	IGNATOV A， GOOL L VAN， TIMOFTE R. Replacing mobile camera ISP with a single deep learning model ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 2275-2285. 10.1109/cvprw50498.2020.00276
25	BLUME M E， FRIEND I. A new look at the capital asset pricing model［J］. The Journal of Finance， 1973， 28（1）： 19-34. 10.1111/j.1540-6261.1973.tb01342.x
26	PURKAIT P， ZHAO C， ZACH C. SPP-net： deep absolute pose regression with synthetic views［EB/OL］. （2017-12-09）［2021-02-05］. .
27	VILLANI C. Optimal Transport： Old and New［M］. Berlin： Springer， 2009： 131-140. 10.1007/978-3-540-71050-9_28
28	PEYRÉ G， CUTURI M. Computational optimal transport： with applications to data science［J］. Foundations and Trends in Machine Learning， 2019， 11（5/6）： 355-607. 10.1561/2200000073
29	DANILA B， YU Y， MARSH J A， et al. Optimal transport on complex networks［J］. Physical Review E， Statistical， Nonlinear， and Soft Matter Physics， 2006， 74（4 Pt 2）： No.046106. 10.1103/physreve.74.046106
30	ARJOVSKY M， CHINTALA S， BOTTOU L. Wasserstein generative adversarial networks ［C］// Proceedings of 34th Machine Learning Research. New York： JMLR.org， 2017： 214-223.
31	ALBAWI S， MOHAMMED T A， AL-AZAWI S. Understanding of a convolutional neural network ［C］// Proceedings of the 2017 International Conference on Engineering and Technology. Piscataway： IEEE， 2017： 1-6. 10.1109/icengtechnol.2017.8308186
32	PEARL J， MACKENZIE D. The Book of Why： the New Science of Cause and Effect［M］. New York： Basic Books， 2018： 6-29.
33	MELLOR J， TURNER J， STORKEY A， et al. Neural architecture search without training［J］. Journal of Machine Learning Research， 2019， 20： 1-21.
34	WEI W W S. Time Series Analysis： Univariate and Multivariate Methods［M］. 2nd ed. London： Pearson， 2006： 15-80.
35	KREMERS J J M， ERICSSON N R， DOLADO J J. The power of cointegration tests［J］. Oxford Bulletin of Economics and Statistics， 1992， 54（3）： 325-348. 10.1111/j.1468-0084.1992.tb00005.x
36	HYLLEBERG S， ENGLE R F， GRANGER C W J， et al. Seasonal integration and cointegration［J］. Journal of Econometrics， 1990， 44（1/2）： 215-238. 10.1016/0304-4076(90)90080-d
37	MOLCHANOV P， MALLYA A， TYREE S， et al. Importance estimation for neural network pruning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 11256-11264. 10.1109/cvpr.2019.01152
38	LIU Z， SUN M J， ZHOU T H， et al. Rethinking the value of network pruning［EB/OL］. （2019-03-05）［2021-02-05］. . 10.1002/mrm.27229
39	GEIRHOS R， RUBISCH P， MICHAELIS C， et al. ImageNet-trained CNNs are biased towards texture； increasing shape bias improves accuracy and robustness［EB/OL］. （2019-01-14）［2021-02-05］. . 10.1167/19.10.209c
40	LI X L， ZHOU Y B， WU T F， et al. Learn to grow： a continual structure learning framework for overcoming catastrophic forgetting ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 3925-3934.
41	WORTSMAN M， FARHADI A， RASTEGARI M. Discovering neural wirings［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/cvpr42600.2020.01191
42	KIM Y， PARK W， ROH M C， et al. GroupFace： learning latent groups and constructing group-based representations for face recognition ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5620-5629. 10.1109/cvpr42600.2020.00566
43	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
44	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
45	KRIZHEVSKY A. learning multiple layers of features from tiny images［R/OL］. ［2021-01-25］. .

[1]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[2]	甘岚, 沈鸿飞, 王瑶, 张跃进. 基于改进DCGAN的数据增强方法[J]. 计算机应用, 2021, 41(5): 1305-1313.
[3]	刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J]. 《计算机应用》唯一官方网站, 2021, 41(5): 1236-1246.
[4]	李慧慧, 闫坤, 张李轩, 刘威, 李执. 基于MobileNetV2的圆形指针式仪表识别系统[J]. 计算机应用, 2021, 41(4): 1214-1220.
[5]	姚博文, 曾碧卿, 蔡剑, 丁美荣. 基于预训练和多层次信息的中文人物关系抽取模型[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3637-3644.
[6]	罗俊, 陈黎飞. 基于BERT的不完全数据情感分类[J]. 计算机应用, 2021, 41(1): 139-144.
[7]	陈佳伟, 韩芳, 王直杰. 基于自注意力门控图卷积网络的特定目标情感分析[J]. 计算机应用, 2020, 40(8): 2202-2206.
[8]	李生武, 张选德. 基于自注意力机制的多域卷积神经网络的视觉追踪[J]. 计算机应用, 2020, 40(8): 2219-2224.
[9]	许一宁, 何小海, 张津, 卿粼波. 基于多层次分辨率递进生成对抗网络的文本生成图像方法[J]. 计算机应用, 2020, 40(12): 3612-3617.
[10]	张小川, 戴旭尧, 刘璐, 冯天硕. 融合多头自注意力机制的中文短文本分类模型[J]. 计算机应用, 2020, 40(12): 3485-3489.
[11]	王昆, 郑毅, 方书雅, 刘守印. 基于文本筛选和改进BERT的长文本方面级情感分析[J]. 计算机应用, 2020, 40(10): 2838-2844.
[12]	曹小鹿, 辛云宏. 基于Wasserstein距离概率分布模型的非线性降维[J]. 计算机应用, 2017, 37(10): 2819-2822.

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 45

相关文章 12

编辑推荐

Metrics