Improved method of convolution neural network based on matrix decomposition

doi:10.11772/j.issn.1001-9081.2022010032

Abstract

Abstract:

Aiming at the difficulty of optimizing the traditional Convolutional Neural Network （CNN） in the training process， an improved method of CNN based on matrix decomposition was proposed. Firstly， the convolution kernel parameter tensor of the model convolution layer during training was converted into the product of multiple parameter matrices through matrix decomposition to form overparameterization. Secondly， these additional linear parameters were added to the back propagation of the network and updated synchronously with other parameters of the model to improve the optimization process of gradient descent. After completing the training， the matrix product was restored to the standard convolution kernel parameters， so that the computational complexity of forward propagation during inference was able to be the same as before the improvement. With thin QR decomposition and reduced Singular Value Decomposition （SVD） applied， the classification effect experiments were carried out on CIFAR-10 （Canadian Institute For Advanced Research， 10 classes） dataset， and further generalization experiments were carried out by using different image classification datasets and different initialization methods. Experimental results show that the classification accuracies of 7 models of different depths of Visual Geometry Group （VGG） and Residual Network （ResNet） based on matrix decomposition are higher than those of the original convolutional neural network models. It can be seen that the matrix decomposition method can make CNN achieve higher classification accuracy， and eventually converge to a better local optimum.

Key words: Convolution Neural Network (CNN), matrix decomposition, Singular Value Decomposition (SVD), overparameterization, image classification

摘要：

针对传统卷积神经网络（CNN）在训练过程中优化难度高的问题，提出基于矩阵分解的CNN改进方法。首先，通过矩阵分解将模型卷积层在训练期间的卷积核参数张量转换为多个参数矩阵的乘积，形成过参数化；其次，将这些额外的线性参数加入网络的反向传播，并与模型的其他参数同步更新，以改善梯度下降的优化过程；完成训练后，将矩阵乘积重新还原为标准卷积核参数，从而使推理期间前向传播的计算复杂度与改进前保持一致。选用简化QR分解和简化奇异值分解（SVD），在CIFAR-10数据集上进行分类效果实验，并用不同的图像分类数据集和初始化方式作进一步的泛化实验。实验结果表明，基于矩阵分解的VGG和残差网络（ResNet）对7个不同深度模型的分类准确率均高于原网络模型，可见矩阵分解方法可以让CNN更快地达到较高的分类准确率，最终收敛得到更好的局部最优。

关键词: 卷积神经网络, 矩阵分解, 奇异值分解, 过参数化, 图像分类

CLC Number:

TP183

Zhenliang LI, Bo LI. Improved method of convolution neural network based on matrix decomposition[J]. Journal of Computer Applications, 2023, 43(3): 685-691.

李振亮, 李波. 基于矩阵分解的卷积神经网络改进方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 685-691.

Figures/Tables 10

References 23

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
2	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2015-04-10）［2021-12-26］. .
3	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
4	HOWARD A G， ZHU M L， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. （2017-04-17）［2021-11-22］. . 10.48550/arXiv.1704.04861
5	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
6	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 91-99.
7	张瑶，卢焕章，张路平，等.基于深度学习的视觉多目标跟踪算法综述［J］.计算机工程与应用，2021，57（13）：55-66.
	ZHANG Y， LU H Z， ZHANG L P， et al. Overview of visual multi-object tracking algorithms with deep learning ［J］. Computer Engineering and Applications， 2021， 57（13）： 55-66.
8	徐辉，祝玉华，甄彤，等.深度神经网络图像语义分割方法综述［J］.计算机科学与探索，2021，15（1）：47-59. 10.3778/j.issn.1673-9418.2004039
	XU H， ZHU Y H， ZHEN T， et al. Survey of image semantic segmentation methods based on deep neural network ［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（1）： 47-59. 10.3778/j.issn.1673-9418.2004039
9	RUSSAKOVSKY O， DENG J， SU H， et al. ImageNet large scale visual recognition challenge ［J］. International Journal of Computer Vision， 2015， 115（3）： 211-252. 10.1007/s11263-015-0816-y
10	ALLEN-ZHU Z， LI Y Z， SONG Z. A convergence theory for deep learning via over-parameterization［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 242-252.
11	ARORA S， COHEN N， HAZAN E. On the optimization of deep networks： implicit acceleration by overparameterization［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 244-253.
12	COSNARD M， MULLER J M， ROBERT Y. Parallel QR decomposition of a rectangular matrix ［J］. Numerische Mathematik， 1986， 48（2）： 239-249. 10.1007/bf01389871
13	KLEMA V， LAUB A. The singular value decomposition： its computation and some applications ［J］. IEEE Transactions on Automatic Control， 1980， 25（2）： 164-176. 10.1109/tac.1980.1102314
14	SRIVASTAVA R K， GREFF K， SCHMIDHUBER J. Highway networks ［EB/OL］. （2015-11-03）［2021-10-15］. .
15	SZEGEDY C， LIU W， JIA Y Q， et al. Going deeper with convolutions ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
16	ZHANG X Y， ZHOU X Y， LIN M X， et al. ShuffleNet： an extremely efficient convolutional neural network for mobile devices ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 6848-6856. 10.1109/cvpr.2018.00716
17	JEON Y， KIM J. Active convolution： learning the shape of convolution for image classification ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4201-4209. 10.1109/cvpr.2017.200
18	LI X， WANG W H， HU X L， et al. Selective kernel networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 510-519. 10.1109/cvpr.2019.00060
19	TARMOUN S， FRANCA G， HAEFFELE B D， et al. Implicit acceleration of gradient flow in overparameterized linear models ［EB/OL］. （2021-03-06）［2021-08-08］. .
20	CAO J M， LI Y Y， SUN M C， et al. DO-Conv： depthwise over-parameterized convolutional layer［J］. IEEE Transactions on Image Processing， 2022， 31： 3726-3736. 10.1109/tip.2022.3175432
21	BOSMA W， CANNON J， PLAYOUST C. The Magma algebra system I： the user language ［J］. Journal of Symbolic Computation， 1997， 24（3/4）： 235-265. 10.1006/jsco.1996.0125
22	HE K M， ZHANG X Y， REN S Q， et al. Delving deep into rectifiers： surpassing human-level performance on ImageNet classification ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1026-1034. 10.1109/iccv.2015.123
23	SAXE A M， MCCLELLAND J L， GANGULI S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks ［EB/OL］. （2014-02-19）［2021-09-11］. . 10.1073/pnas.1820226116

模型	准确率/%	训练用时/s	推理用时/ms
VGG11	85.58	4 004	4 596
VGG11+TQRD	87.42	4 173	4 597
VGG11+RSVD	86.75	4 367	4 489
VGG13	88.22	4 929	4 770
VGG13+TQRD	89.33	5 402	4 795
VGG13+RSVD	88.87	5 439	4 785
VGG16	86.24	5 762	5 025
VGG16+TQRD	87.19	6 242	5 018
VGG16+RSVD	86.40	6 579	5 098
VGG19	86.34	6 474	5 341
VGG19+TQRD	87.39	7 094	5 364
VGG19+RSVD	87.21	7 522	5 365

模型	准确率/%	训练用时/s	推理用时/ms
VGG11	85.58	4 004	4 596
VGG11+TQRD	87.42	4 173	4 597
VGG11+RSVD	86.75	4 367	4 489
VGG13	88.22	4 929	4 770
VGG13+TQRD	89.33	5 402	4 795
VGG13+RSVD	88.87	5 439	4 785
VGG16	86.24	5 762	5 025
VGG16+TQRD	87.19	6 242	5 018
VGG16+RSVD	86.40	6 579	5 098
VGG19	86.34	6 474	5 341
VGG19+TQRD	87.39	7 094	5 364
VGG19+RSVD	87.21	7 522	5 365

模型	准确率/%	训练用时/s	推理用时/ms
ResNet18	87.00	9 172	5 980
ResNet18+TQRD	87.66	9 616	6 091
ResNet18+RSVD	87.61	10 543	6 063
ResNet34	87.64	14 603	7 357
ResNet34+TQRD	89.25	15 605	7 468
ResNet34+RSVD	88.27	15 989	7 348
ResNet50a	85.96	21 727	10 764
ResNet50a+TQRD	86.29	21 254	10 774
ResNet50a+RSVD	86.04	21 712	10 738
ResNet50b	85.96	21 727	10 764
ResNet50b+TQRD	86.81	21 334	11 107
ResNet50b+RSVD	87.11	22 054	11 013

模型	准确率/%	训练用时/s	推理用时/ms
ResNet18	87.00	9 172	5 980
ResNet18+TQRD	87.66	9 616	6 091
ResNet18+RSVD	87.61	10 543	6 063
ResNet34	87.64	14 603	7 357
ResNet34+TQRD	89.25	15 605	7 468
ResNet34+RSVD	88.27	15 989	7 348
ResNet50a	85.96	21 727	10 764
ResNet50a+TQRD	86.29	21 254	10 774
ResNet50a+RSVD	86.04	21 712	10 738
ResNet50b	85.96	21 727	10 764
ResNet50b+TQRD	86.81	21 334	11 107
ResNet50b+RSVD	87.11	22 054	11 013

VGG11模块					TQRD	RSVD
C1	C2	C3	C4	C5	TQRD	RSVD
—	—	—	—	—	85.58	85.58
√	—	—	—	—	86.83	86.81
—	√	—	—	—	86.07	85.52
—	—	√	—	—	86.66	85.65
—	—	—	√		85.65	84.66
—	—	—	—	√	85.05	85.05
√	√	—	—	—	86.47	86.27
√	√	√	—	—	87.4	87.25
√	√	√	√		87.57	86.54
√	√	√	√	√	87.42	86.75