基于小样本无梯度学习的卷积结构预训练模型性能优化方法

doi:10.11772/j.issn.1001-9081.2021020230

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 365-374.DOI: 10.11772/j.issn.1001-9081.2021020230

所属专题：人工智能

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

李亚鸣¹^,², 邢凯¹^,²(), 邓洪武¹^,², 王志勇¹^,², 胡璇¹^,²

^1.中国科学技术大学计算机科学与技术学院, 合肥 230027
^2.中国科学技术大学苏州高等研究院, 江苏苏州 215123

收稿日期:2021-02-07 修回日期:2021-03-18 接受日期:2021-03-26 发布日期:2022-02-11 出版日期:2022-02-10
通讯作者: 邢凯
作者简介:李亚鸣（1996—），男，江西赣州人，硕士研究生，主要研究方向：深度学习；
邢凯（1981—），男，江苏苏州人，副教授，博士，CCF会员，主要研究方向：操作系统、数据挖掘；
邓洪武（1996—），男，安徽安庆人，硕士研究生，主要研究方向：深度学习；
王志勇（1996—），男，河南商丘人，硕士研究生，主要研究方向：深度学习；
胡璇（1995—），女，安徽池州人，硕士研究生，主要研究方向：深度学习。

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

Yaming LI¹^,², Kai XING¹^,²(), Hongwu DENG¹^,², Zhiyong WANG¹^,², Xuan HU¹^,²

^1.School of Computer Science and Technology，University of Science and Technology of China，Hefei Anhui 230027，China
^2.Suzhou Institute for Advanced Research，University of Science and Technology of China，Suzhou Jiangsu 215123，China

Received:2021-02-07 Revised:2021-03-18 Accepted:2021-03-26 Online:2022-02-11 Published:2022-02-10
Contact: Kai XING
About author:LI Yaming， born in 1996， M. S. candidate. His research interests include deep learning.
XING Kai， born in 1981， Ph. D.， associate professor. His research interests include operating system， data mining.
DENG Hongwu， born in 1996， M. S. candidate. His research interests include deep learning.
WANG Zhiyong， born in 1996， M. S. candidate. His research interests include deep learning.
HU Xuan， born in 1995， M. S. candidate. Her research interests include deep learning.

摘要/Abstract

摘要：

针对卷积结构的深度学习模型在小样本学习场景中泛化性能较差的问题，以AlexNet和ResNet为例，提出一种基于小样本无梯度学习的卷积结构预训练模型的性能优化方法。首先基于因果干预对样本数据进行调制，由非时序数据生成序列数据，并基于协整检验从数据分布平稳性的角度对预训练模型进行定向修剪；然后基于资本资产定价模型（CAPM）以及最优传输理论，在预训练模型中间输出过程中进行无需梯度传播的正向学习并构建一种全新的结构，从而生成在分布空间中具有明确类间区分性的表征向量；最后基于自注意力机制对生成的有效特征进行自适应加权处理，并在全连接层对特征进行聚合，从而生成具有弱相关性的embedding向量。实验结果表明所提出的方法能够使AlexNet和ResNet卷积结构预训练模型在ImageNet 2012数据集的100类图片上的Top-1准确率分别从58.82%、78.51%提升到68.50%、85.72%，可见所提方法能够基于小样本训练数据有效提高卷积结构预训练模型的性能。

关键词: 资本资产定价模型, Wasserstein距离, 无梯度学习, 自注意力机制, 预训练模型

Abstract:

Deep learning model with convolution structure has poor generalization performance in few-shot learning scenarios. Therefore， with AlexNet and ResNet as examples， a derivative-free few-shot learning based performance optimization method of convolution structured pre-trained models was proposed. Firstly， the sample data were modulated to generate the series data from the non-series data based on causal intervention， and the pre-trained model was pruned directly based on the co-integration test from the perspective of data distribution stability. Then， based on Capital Asset Pricing Model （CAPM） and optimal transmission theory， in the intermediate output process of the pre-trained model， the forward learning without gradient propagation was carried out， and a new structure was constructed， thereby generating the representation vectors with clear inter-class distinguishability in the distribution space. Finally， the generated effective features were adaptively weighted based on the self-attention mechanism， and the features were aggregated in the fully connected layer to generate the embedding vectors with weak correlation. Experimental results indicate that the proposed method can increase the Top-1 accuracies of the AlexNet and ResNet convolution structured pre-trained models on 100 classes of images in ImageNet 2012 dataset from 58.82%， 78.51% to 68.50%， 85.72%， respectively. Therefore， the proposed method can effectively improve the performance of convolution structured pre-trained models based on few-shot training data.

Key words: Capital Asset Pricing Model (CAPM), Wasserstein distance, derivative-free learning, self-attention mechanism, pre-trained model

中图分类号:

TP18

李亚鸣, 邢凯, 邓洪武, 王志勇, 胡璇. 基于小样本无梯度学习的卷积结构预训练模型性能优化方法[J]. 计算机应用, 2022, 42(2): 365-374.

Yaming LI, Kai XING, Hongwu DENG, Zhiyong WANG, Xuan HU. Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure[J]. Journal of Computer Applications, 2022, 42(2): 365-374.

图/表 10

图1 资本资产定价模型

Fig. 1 Capital Asset Pricing Model

图2 样本数据增强

Fig. 2 Sample data augmentation

图3 本文方法的整体框架

Fig. 3 Overall framework of the proposed method

图4 自注意力机制结构

Fig. 4 Structure of self-attention mechanism

图5 随机类别对应的排序后的标准差分布

Fig. 5 Sorted standard deviation distribution corresponding to random class

图6 模型经过修剪后，单类别召回率普遍提升

Fig. 6 Most single class recall rates increasing after model pruning

图7 不同物种对应的单分类模型有效结构相似比例

Fig. 7 Similar ratios of effective structures of single classification models for different species

图8 随机选择类别，资本资产定价模型组合优化前后所有采样的收益R分布

Fig. 8 After randomly selecting a class， distribution of income R of all samples before and after combinational optimization of capital asset pricing model

图9 资本资产定价模型组合优化前后单采样点的可区分性

Fig. 9 Distinguishability of single node before and after combinational optimization of capital asset pricing model

表1 图片分类任务性能比较 ( %)

Tab. 1 Performance comparison of image classification tasks

网络模型	数据集	Top-1 Acc	Top-5 Acc
AlexNet	ImageNet 2012（100类）	58.82	83.51
AlexNet	CIFAR-100	61.29	81.34
AlexNet改进模型	ImageNet 2012（100类）	68.50	92.25
AlexNet改进模型	CIFAR-100	69.15	89.55
ResNet50	ImageNet 2012（100类）	78.51	94.20
ResNet50改进模型	ImageNet 2012（100类）	85.72	96.65

参考文献 45

1	XIE S N， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995. 10.1109/cvpr.2017.634
2	HUANG G， LIU Z， MAATEN L VAN DER， et al. Densely connected convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2261-2269. 10.1109/cvpr.2017.243
3	ZHANG M R， LUCAS J， HINTON G， et al. Lookahead optimizer： k steps forward， 1 step back［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. .
4	LILLICRAP T P， SANTORO A， MARRIS L， et al. Backpropagation and the brain［J］. Nature Reviews Neuroscience， 2020， 21（6）： 335-346. 10.1038/s41583-020-0277-3
5	KHAN A， SOHAIL A， ZAHOORA U， et al. A survey of the recent architectures of deep convolutional neural networks［J］. Artificial Intelligence Review， 2020， 53（8）： 5455-5516. 10.1007/s10462-020-09825-6
6	KINGMA D P， BA J L. Adam： a method for stochastic optimization［J］. ［EB/OL］. （2017-01-30）［2021-01-03］. .
7	DUCHI J， HAZAN E， SINGER Y. Adaptive sub gradient methods for online learning and stochastic optimization［J］. Journal of Machine Learning Research， 2011， 12： 2121-2159.
8	FERNÁNDEZ-REDONDO M， HERNÁNDEZ-ESPINOSA C. Weight initialization methods for multilayer feedforward ［C］// Proceedings of the 2001 European Symposium on Artificial Neural Networks. ［2021-01-22］. . 10.1109/ijcnn.2001.939011
9	SANTURKAR S， TSIPRAS D， ILYAS A， et al. How does batch normalization help optimization？［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 2488-2498.
10	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
11	BACHLECHNER T， MAJUMDER B P， MAO H H， et al. ReZero is all you need： fast convergence at large depth［EB/OL］. （2020-06-25）［2021-02-05］. .
12	RAMACHANDRAN P， ZOPH B， LE Q V. Searching for activation functions［EB/OL］. （2017-10-27）［2021-02-05］. .
13	MISRA D， LANDSKAPE. Mish： a self-regularized non-monotonic neural activation function ［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.928.
14	GAO Z T， WANG L M， WU G S. LIP： local importance-based pooling ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer. Piscataway： IEEE. 2019： 3354-3363. 10.1109/iccv.2019.00345
15	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
16	WANG Z N， XIANG C Q， ZOU W B， et al. MMA regularization： decorrelating weights of neural networks by maximizing the minimal angles［C/OL］// Proceedings of the 34th Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/lsp.2020.3037512
17	JENSEN M C， BLACK F， SCHOLES M S. The capital asset pricing model： some empirical tests［M］// JENSEN M C. Studies in the Theory of Capital Markets. New York： Praeger Publishers Inc.， 1972： 25-28.
18	LECUN Y， BOTTOU L， BENGIO Y， et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE， 1998， 86（11）： 2278-2324. 10.1109/5.726791
19	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
20	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15： 1929-1958.
21	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-01-20］. . 10.5244/c.28.6
22	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
23	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
24	IGNATOV A， GOOL L VAN， TIMOFTE R. Replacing mobile camera ISP with a single deep learning model ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 2275-2285. 10.1109/cvprw50498.2020.00276
25	BLUME M E， FRIEND I. A new look at the capital asset pricing model［J］. The Journal of Finance， 1973， 28（1）： 19-34. 10.1111/j.1540-6261.1973.tb01342.x
26	PURKAIT P， ZHAO C， ZACH C. SPP-net： deep absolute pose regression with synthetic views［EB/OL］. （2017-12-09）［2021-02-05］. .
27	VILLANI C. Optimal Transport： Old and New［M］. Berlin： Springer， 2009： 131-140. 10.1007/978-3-540-71050-9_28
28	PEYRÉ G， CUTURI M. Computational optimal transport： with applications to data science［J］. Foundations and Trends in Machine Learning， 2019， 11（5/6）： 355-607. 10.1561/2200000073
29	DANILA B， YU Y， MARSH J A， et al. Optimal transport on complex networks［J］. Physical Review E， Statistical， Nonlinear， and Soft Matter Physics， 2006， 74（4 Pt 2）： No.046106. 10.1103/physreve.74.046106
30	ARJOVSKY M， CHINTALA S， BOTTOU L. Wasserstein generative adversarial networks ［C］// Proceedings of 34th Machine Learning Research. New York： JMLR.org， 2017： 214-223.
31	ALBAWI S， MOHAMMED T A， AL-AZAWI S. Understanding of a convolutional neural network ［C］// Proceedings of the 2017 International Conference on Engineering and Technology. Piscataway： IEEE， 2017： 1-6. 10.1109/icengtechnol.2017.8308186
32	PEARL J， MACKENZIE D. The Book of Why： the New Science of Cause and Effect［M］. New York： Basic Books， 2018： 6-29.
33	MELLOR J， TURNER J， STORKEY A， et al. Neural architecture search without training［J］. Journal of Machine Learning Research， 2019， 20： 1-21.
34	WEI W W S. Time Series Analysis： Univariate and Multivariate Methods［M］. 2nd ed. London： Pearson， 2006： 15-80.
35	KREMERS J J M， ERICSSON N R， DOLADO J J. The power of cointegration tests［J］. Oxford Bulletin of Economics and Statistics， 1992， 54（3）： 325-348. 10.1111/j.1468-0084.1992.tb00005.x
36	HYLLEBERG S， ENGLE R F， GRANGER C W J， et al. Seasonal integration and cointegration［J］. Journal of Econometrics， 1990， 44（1/2）： 215-238. 10.1016/0304-4076(90)90080-d
37	MOLCHANOV P， MALLYA A， TYREE S， et al. Importance estimation for neural network pruning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 11256-11264. 10.1109/cvpr.2019.01152
38	LIU Z， SUN M J， ZHOU T H， et al. Rethinking the value of network pruning［EB/OL］. （2019-03-05）［2021-02-05］. . 10.1002/mrm.27229
39	GEIRHOS R， RUBISCH P， MICHAELIS C， et al. ImageNet-trained CNNs are biased towards texture； increasing shape bias improves accuracy and robustness［EB/OL］. （2019-01-14）［2021-02-05］. . 10.1167/19.10.209c
40	LI X L， ZHOU Y B， WU T F， et al. Learn to grow： a continual structure learning framework for overcoming catastrophic forgetting ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 3925-3934.
41	WORTSMAN M， FARHADI A， RASTEGARI M. Discovering neural wirings［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/cvpr42600.2020.01191
42	KIM Y， PARK W， ROH M C， et al. GroupFace： learning latent groups and constructing group-based representations for face recognition ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5620-5629. 10.1109/cvpr42600.2020.00566
43	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
44	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
45	KRIZHEVSKY A. learning multiple layers of features from tiny images［R/OL］. ［2021-01-25］. .

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[4]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[5]	李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420.
[6]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[7]	赵征宇, 罗景, 涂新辉. 基于多粒度语义融合的信息检索方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1775-1780.
[8]	徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831.
[9]	黄荣, 宋俊杰, 周树波, 刘浩. 基于自监督视觉Transformer的图像美学质量评价方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1269-1276.
[10]	余杭, 周艳玲, 翟梦鑫, 刘涵. 基于预训练模型与标签融合的文本分类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 709-714.
[11]	黄子麒, 胡建鹏. 实体类别增强的汽车领域嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 377-384.
[12]	王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417.
[13]	罗歆然, 李天瑞, 贾真. 基于自注意力机制与词汇增强的中文医学命名实体识别[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 385-392.
[14]	仇丽青, 苏小盼. 个性化多层兴趣提取点击率预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3411-3418.
[15]	林翔, 金彪, 尤玮婧, 姚志强, 熊金波. 基于脆弱指纹的深度神经网络模型完整性验证框架[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3479-3486.

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 45

相关文章 15

编辑推荐

Metrics