Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

doi:10.11772/j.issn.1001-9081.2021020230

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (2): 365-374.DOI: 10.11772/j.issn.1001-9081.2021020230

• Artificial intelligence • Previous Articles

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

Yaming LI¹^,², Kai XING¹^,²(), Hongwu DENG¹^,², Zhiyong WANG¹^,², Xuan HU¹^,²

^1.School of Computer Science and Technology，University of Science and Technology of China，Hefei Anhui 230027，China
^2.Suzhou Institute for Advanced Research，University of Science and Technology of China，Suzhou Jiangsu 215123，China

Received:2021-02-07 Revised:2021-03-18 Accepted:2021-03-26 Online:2021-04-07 Published:2022-02-10
Contact: Kai XING
About author:LI Yaming， born in 1996， M. S. candidate. His research interests include deep learning.
XING Kai， born in 1981， Ph. D.， associate professor. His research interests include operating system， data mining.
DENG Hongwu， born in 1996， M. S. candidate. His research interests include deep learning.
WANG Zhiyong， born in 1996， M. S. candidate. His research interests include deep learning.
HU Xuan， born in 1995， M. S. candidate. Her research interests include deep learning.

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

李亚鸣¹^,², 邢凯¹^,²(), 邓洪武¹^,², 王志勇¹^,², 胡璇¹^,²

^1.中国科学技术大学计算机科学与技术学院, 合肥 230027
^2.中国科学技术大学苏州高等研究院, 江苏苏州 215123

通讯作者: 邢凯
作者简介:李亚鸣（1996—），男，江西赣州人，硕士研究生，主要研究方向：深度学习；
邢凯（1981—），男，江苏苏州人，副教授，博士，CCF会员，主要研究方向：操作系统、数据挖掘；
邓洪武（1996—），男，安徽安庆人，硕士研究生，主要研究方向：深度学习；
王志勇（1996—），男，河南商丘人，硕士研究生，主要研究方向：深度学习；
胡璇（1995—），女，安徽池州人，硕士研究生，主要研究方向：深度学习。

Abstract

Abstract:

Deep learning model with convolution structure has poor generalization performance in few-shot learning scenarios. Therefore， with AlexNet and ResNet as examples， a derivative-free few-shot learning based performance optimization method of convolution structured pre-trained models was proposed. Firstly， the sample data were modulated to generate the series data from the non-series data based on causal intervention， and the pre-trained model was pruned directly based on the co-integration test from the perspective of data distribution stability. Then， based on Capital Asset Pricing Model （CAPM） and optimal transmission theory， in the intermediate output process of the pre-trained model， the forward learning without gradient propagation was carried out， and a new structure was constructed， thereby generating the representation vectors with clear inter-class distinguishability in the distribution space. Finally， the generated effective features were adaptively weighted based on the self-attention mechanism， and the features were aggregated in the fully connected layer to generate the embedding vectors with weak correlation. Experimental results indicate that the proposed method can increase the Top-1 accuracies of the AlexNet and ResNet convolution structured pre-trained models on 100 classes of images in ImageNet 2012 dataset from 58.82%， 78.51% to 68.50%， 85.72%， respectively. Therefore， the proposed method can effectively improve the performance of convolution structured pre-trained models based on few-shot training data.

Key words: Capital Asset Pricing Model (CAPM), Wasserstein distance, derivative-free learning, self-attention mechanism, pre-trained model

摘要：

针对卷积结构的深度学习模型在小样本学习场景中泛化性能较差的问题，以AlexNet和ResNet为例，提出一种基于小样本无梯度学习的卷积结构预训练模型的性能优化方法。首先基于因果干预对样本数据进行调制，由非时序数据生成序列数据，并基于协整检验从数据分布平稳性的角度对预训练模型进行定向修剪；然后基于资本资产定价模型（CAPM）以及最优传输理论，在预训练模型中间输出过程中进行无需梯度传播的正向学习并构建一种全新的结构，从而生成在分布空间中具有明确类间区分性的表征向量；最后基于自注意力机制对生成的有效特征进行自适应加权处理，并在全连接层对特征进行聚合，从而生成具有弱相关性的embedding向量。实验结果表明所提出的方法能够使AlexNet和ResNet卷积结构预训练模型在ImageNet 2012数据集的100类图片上的Top-1准确率分别从58.82%、78.51%提升到68.50%、85.72%，可见所提方法能够基于小样本训练数据有效提高卷积结构预训练模型的性能。

关键词: 资本资产定价模型, Wasserstein距离, 无梯度学习, 自注意力机制, 预训练模型

CLC Number:

TP18

Yaming LI, Kai XING, Hongwu DENG, Zhiyong WANG, Xuan HU. Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure[J]. Journal of Computer Applications, 2022, 42(2): 365-374.

李亚鸣, 邢凯, 邓洪武, 王志勇, 胡璇. 基于小样本无梯度学习的卷积结构预训练模型性能优化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 365-374.

Figures/Tables 10

Fig. 1 Capital Asset Pricing Model

Fig. 2 Sample data augmentation

Fig. 3 Overall framework of the proposed method

Fig. 4 Structure of self-attention mechanism

Fig. 5 Sorted standard deviation distribution corresponding to random class

Fig. 6 Most single class recall rates increasing after model pruning

Fig. 7 Similar ratios of effective structures of single classification models for different species

Fig. 8 After randomly selecting a class， distribution of income R of all samples before and after combinational optimization of capital asset pricing model

Fig. 9 Distinguishability of single node before and after combinational optimization of capital asset pricing model

Tab. 1 Performance comparison of image classification tasks

网络模型	数据集	Top-1 Acc	Top-5 Acc
AlexNet	ImageNet 2012（100类）	58.82	83.51
AlexNet	CIFAR-100	61.29	81.34
AlexNet改进模型	ImageNet 2012（100类）	68.50	92.25
AlexNet改进模型	CIFAR-100	69.15	89.55
ResNet50	ImageNet 2012（100类）	78.51	94.20
ResNet50改进模型	ImageNet 2012（100类）	85.72	96.65

References 45

1	XIE S N， GIRSHICK R， DOLLÁR P， et al. Aggregated residual transformations for deep neural networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5987-5995. 10.1109/cvpr.2017.634
2	HUANG G， LIU Z， MAATEN L VAN DER， et al. Densely connected convolutional networks ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2261-2269. 10.1109/cvpr.2017.243
3	ZHANG M R， LUCAS J， HINTON G， et al. Lookahead optimizer： k steps forward， 1 step back［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. .
4	LILLICRAP T P， SANTORO A， MARRIS L， et al. Backpropagation and the brain［J］. Nature Reviews Neuroscience， 2020， 21（6）： 335-346. 10.1038/s41583-020-0277-3
5	KHAN A， SOHAIL A， ZAHOORA U， et al. A survey of the recent architectures of deep convolutional neural networks［J］. Artificial Intelligence Review， 2020， 53（8）： 5455-5516. 10.1007/s10462-020-09825-6
6	KINGMA D P， BA J L. Adam： a method for stochastic optimization［J］. ［EB/OL］. （2017-01-30）［2021-01-03］. .
7	DUCHI J， HAZAN E， SINGER Y. Adaptive sub gradient methods for online learning and stochastic optimization［J］. Journal of Machine Learning Research， 2011， 12： 2121-2159.
8	FERNÁNDEZ-REDONDO M， HERNÁNDEZ-ESPINOSA C. Weight initialization methods for multilayer feedforward ［C］// Proceedings of the 2001 European Symposium on Artificial Neural Networks. ［2021-01-22］. . 10.1109/ijcnn.2001.939011
9	SANTURKAR S， TSIPRAS D， ILYAS A， et al. How does batch normalization help optimization？［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 2488-2498.
10	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
11	BACHLECHNER T， MAJUMDER B P， MAO H H， et al. ReZero is all you need： fast convergence at large depth［EB/OL］. （2020-06-25）［2021-02-05］. .
12	RAMACHANDRAN P， ZOPH B， LE Q V. Searching for activation functions［EB/OL］. （2017-10-27）［2021-02-05］. .
13	MISRA D， LANDSKAPE. Mish： a self-regularized non-monotonic neural activation function ［C］// Proceedings of the 2020 British Machine Vision Conference. Durham： BMVA Press， 2020： No.928.
14	GAO Z T， WANG L M， WU G S. LIP： local importance-based pooling ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer. Piscataway： IEEE. 2019： 3354-3363. 10.1109/iccv.2019.00345
15	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1440-1448. 10.1109/iccv.2015.169
16	WANG Z N， XIANG C Q， ZOU W B， et al. MMA regularization： decorrelating weights of neural networks by maximizing the minimal angles［C/OL］// Proceedings of the 34th Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/lsp.2020.3037512
17	JENSEN M C， BLACK F， SCHOLES M S. The capital asset pricing model： some empirical tests［M］// JENSEN M C. Studies in the Theory of Capital Markets. New York： Praeger Publishers Inc.， 1972： 25-28.
18	LECUN Y， BOTTOU L， BENGIO Y， et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE， 1998， 86（11）： 2278-2324. 10.1109/5.726791
19	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
20	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15： 1929-1958.
21	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［EB/OL］. （2015-04-10）［2021-01-20］. . 10.5244/c.28.6
22	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： inverted residuals and linear bottlenecks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4510-4520. 10.1109/cvpr.2018.00474
23	LIN T Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 936-944. 10.1109/cvpr.2017.106
24	IGNATOV A， GOOL L VAN， TIMOFTE R. Replacing mobile camera ISP with a single deep learning model ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2020： 2275-2285. 10.1109/cvprw50498.2020.00276
25	BLUME M E， FRIEND I. A new look at the capital asset pricing model［J］. The Journal of Finance， 1973， 28（1）： 19-34. 10.1111/j.1540-6261.1973.tb01342.x
26	PURKAIT P， ZHAO C， ZACH C. SPP-net： deep absolute pose regression with synthetic views［EB/OL］. （2017-12-09）［2021-02-05］. .
27	VILLANI C. Optimal Transport： Old and New［M］. Berlin： Springer， 2009： 131-140. 10.1007/978-3-540-71050-9_28
28	PEYRÉ G， CUTURI M. Computational optimal transport： with applications to data science［J］. Foundations and Trends in Machine Learning， 2019， 11（5/6）： 355-607. 10.1561/2200000073
29	DANILA B， YU Y， MARSH J A， et al. Optimal transport on complex networks［J］. Physical Review E， Statistical， Nonlinear， and Soft Matter Physics， 2006， 74（4 Pt 2）： No.046106. 10.1103/physreve.74.046106
30	ARJOVSKY M， CHINTALA S， BOTTOU L. Wasserstein generative adversarial networks ［C］// Proceedings of 34th Machine Learning Research. New York： JMLR.org， 2017： 214-223.
31	ALBAWI S， MOHAMMED T A， AL-AZAWI S. Understanding of a convolutional neural network ［C］// Proceedings of the 2017 International Conference on Engineering and Technology. Piscataway： IEEE， 2017： 1-6. 10.1109/icengtechnol.2017.8308186
32	PEARL J， MACKENZIE D. The Book of Why： the New Science of Cause and Effect［M］. New York： Basic Books， 2018： 6-29.
33	MELLOR J， TURNER J， STORKEY A， et al. Neural architecture search without training［J］. Journal of Machine Learning Research， 2019， 20： 1-21.
34	WEI W W S. Time Series Analysis： Univariate and Multivariate Methods［M］. 2nd ed. London： Pearson， 2006： 15-80.
35	KREMERS J J M， ERICSSON N R， DOLADO J J. The power of cointegration tests［J］. Oxford Bulletin of Economics and Statistics， 1992， 54（3）： 325-348. 10.1111/j.1468-0084.1992.tb00005.x
36	HYLLEBERG S， ENGLE R F， GRANGER C W J， et al. Seasonal integration and cointegration［J］. Journal of Econometrics， 1990， 44（1/2）： 215-238. 10.1016/0304-4076(90)90080-d
37	MOLCHANOV P， MALLYA A， TYREE S， et al. Importance estimation for neural network pruning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 11256-11264. 10.1109/cvpr.2019.01152
38	LIU Z， SUN M J， ZHOU T H， et al. Rethinking the value of network pruning［EB/OL］. （2019-03-05）［2021-02-05］. . 10.1002/mrm.27229
39	GEIRHOS R， RUBISCH P， MICHAELIS C， et al. ImageNet-trained CNNs are biased towards texture； increasing shape bias improves accuracy and robustness［EB/OL］. （2019-01-14）［2021-02-05］. . 10.1167/19.10.209c
40	LI X L， ZHOU Y B， WU T F， et al. Learn to grow： a continual structure learning framework for overcoming catastrophic forgetting ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 3925-3934.
41	WORTSMAN M， FARHADI A， RASTEGARI M. Discovering neural wirings［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-01-22］. . 10.1109/cvpr42600.2020.01191
42	KIM Y， PARK W， ROH M C， et al. GroupFace： learning latent groups and constructing group-based representations for face recognition ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5620-5629. 10.1109/cvpr42600.2020.00566
43	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
44	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
45	KRIZHEVSKY A. learning multiple layers of features from tiny images［R/OL］. ［2021-01-25］. .

[1]	DANG Weichao, LI Tao, BAI Shangwang, GAO Gaimei, LIU Chunxia. Real-time remaining life prediction method of Web software system based on self-attention-long short-term memory network [J]. Journal of Computer Applications, 2021, 41(8): 2346-2351.
[2]	GAN Lan, SHEN Hongfei, WANG Yao, ZHANG Yuejin. Data augmentation method based on improved deep convolutional generative adversarial networks [J]. Journal of Computer Applications, 2021, 41(5): 1305-1313.
[3]	LIU Ruiheng, YE Xia, YUE Zengying. Review of pre-trained models for natural language processing tasks [J]. Journal of Computer Applications, 2021, 41(5): 1236-1246.
[4]	LI Huihui, YAN Kun, ZHANG Lixuan, LIU Wei, LI Zhi. Circular pointer instrument recognition system based on MobileNetV2 [J]. Journal of Computer Applications, 2021, 41(4): 1214-1220.
[5]	CHEN Jiawei, HAN Fang, WANG Zhijie. Aspect-based sentiment analysis with self-attention gated graph convolutional network [J]. Journal of Computer Applications, 2020, 40(8): 2202-2206.
[6]	LI Shengwu, ZHANG Xuande. Multi-domain convolutional neural network based on self-attention mechanism for visual tracking [J]. Journal of Computer Applications, 2020, 40(8): 2219-2224.
[7]	XU Yining, HE Xiaohai, ZHANG Jin, QING Linbo. Text-to-image synthesis method based on multi-level progressive resolution generative adversarial networks [J]. Journal of Computer Applications, 2020, 40(12): 3612-3617.
[8]	ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2020, 40(12): 3485-3489.
[9]	WANG Kun, ZHENG Yi, FANG Shuya, LIU Shouyin. Long text aspect-level sentiment analysis based on text filtering and improved BERT [J]. Journal of Computer Applications, 2020, 40(10): 2838-2844.
[10]	CAO Xiaolu, XIN Yunhong. Probabilistic distribution model based on Wasserstein distance for nonlinear dimensionality reduction [J]. Journal of Computer Applications, 2017, 37(10): 2819-2822.

Derivative-free few-shot learning based performance optimization method of pre-trained models with convolution structure

基于小样本无梯度学习的卷积结构预训练模型性能优化方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 45

Related Articles 10

Recommended Articles

Metrics