Deep network compression method based on low-rank decomposition and vector quantization

doi:10.11772/j.issn.1001-9081.2023071027

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 1987-1994.DOI: 10.11772/j.issn.1001-9081.2023071027

• Artificial intelligence • Next Articles

Deep network compression method based on low-rank decomposition and vector quantization

Dongwei WANG¹^,²^,³, Baichen LIU¹^,²^,³, Zhi HAN¹^,²(), Yanmei WANG¹^,²^,³, Yandong TANG¹^,²

^1.State Key Laboratory of Robotics （Shenyang Institute of Automation，Chinese Academy of Sciences），Shenyang Liaoning 110016，China
^2.Institutes for Robotics and Intelligent Manufacturing，Chinese Academy of Sciences，Shenyang Liaoning 110016，China
^3.University of Chinese Academy of Sciences，Beijing 100049，China

Received:2023-07-30 Revised:2023-09-18 Accepted:2023-09-21 Online:2023-10-26 Published:2024-07-10
Contact: Zhi HAN
About author:WANG Dongwei， born in 1999， M. S. candidate. His research interests include deep network compression， knowledge transfer.
LIU Baichen， born in 1994， Ph. D. candidate. His research interests include deep learning， deep network compression.
WANG Yanmei， born in 1996， Ph. D. candidate. Her research interests include transfer learning， domain generalization.
TANG Yandong， born in 1962， Ph. D.， research fellow. His research interests include robot vision， image processing， pattern recognition.
First author contact:HAN Zhi， born in 1983， Ph. D.， research fellow. His research interests include computer vision， matrix completion， deep learning.
Supported by:
National Key Research and Development Program(2020YFB1313400)

基于低秩分解和向量量化的深度网络压缩方法

王东炜¹^,²^,³, 刘柏辰¹^,²^,³, 韩志¹^,²(), 王艳美¹^,²^,³, 唐延东¹^,²

^1.机器人学国家重点实验室(中国科学院沈阳自动化研究所), 沈阳 110016
^2.中国科学院机器人与智能制造研究院, 沈阳 110016
^3.中国科学院大学, 北京 100049

通讯作者: 韩志
作者简介:王东炜（1999—），男，河北唐山人，硕士研究生，主要研究方向：深度网络压缩、知识迁移；
刘柏辰（1994—），男，吉林吉林人，博士研究生，主要研究方向：深度学习、深度网络压缩；
王艳美（1996—），女，山东威海人，博士研究生，主要研究方向：迁移学习、域泛化；
唐延东（1962—），男，山东聊城人，研究员，博士，主要研究方向：机器人视觉、图像处理、模式识别。
第一联系人：韩志（1983—），男，辽宁沈阳人，研究员，博士，主要研究方向：计算机视觉、矩阵恢复、深度学习；
基金资助:
国家重点研发计划项目(2020YFB1313400)

Abstract

Abstract:

As the development of artificial intelligence， deep neural network has become an essential tool in various pattern recognition tasks. Deploying deep Convolutional Neural Networks （CNN） on edge computing equipment is challenging due to storage space and computing resource constraints. Therefore， deep network compression has become an important research topic in recent years. Low-rank decomposition and vector quantization are the most popular network compression techniques， which both try to find a compact representation of the original network， thereby reducing the redundancy of network parameters. By establishing a joint compression framework， a deep network compression method based on low-rank decomposition and vector decomposition — Quantized Tensor Decomposition （QTD） was proposed to obtain higher compression ratio by performing further quantization based on the low-rank structure of network. Experimental results of classical ResNet and the proposed method on CIFAR-10 dataset show that the volume can be compressed to 1% by QTD with a slight accuracy drop of 1.71 percentage points. Moreover， the proposed method was compared with the quantization-based method PQF （Permute， Quantize， and Fine-tune）， the low-rank decomposition-based method TDNR （Tucker Decomposition with Nonlinear Response）， and the pruning-based method CLIP-Q （Compression Learning by In-parallel Pruning-Quantization） on large dataset ImageNet. Experimental results show that QTD can maintain better classification accuracy with same compression range.

Key words: Convolutional Neural Network (CNN), tensor decomposition, vector quantization, model compression, image classification

摘要：

随着人工智能的发展，深度神经网络成为多种模式识别任务中必不可少的工具，由于深度卷积神经网络（CNN）参数量巨大、计算复杂度高，将它部署到计算资源和存储空间受限的边缘计算设备上成为一项挑战。因此，深度网络压缩成为近年来的研究热点。低秩分解与向量量化是深度网络压缩中重要的两个研究分支，其核心思想都是通过找到原网络结构的一种紧凑型表达，从而降低网络参数的冗余程度。通过建立联合压缩框架，提出一种基于低秩分解和向量量化的深度网络压缩方法——可量化的张量分解（QTD）。该方法能够在网络低秩结构的基础上实现进一步的量化，从而得到更大的压缩比。在CIFAR-10数据集上对经典ResNet和该方法进行验证的实验结果表明，QTD能够在准确率仅损失1.71个百分点的情况下，将网络参数量压缩至原来的1%。而在大型数据集ImageNet上把所提方法与基于量化的方法PQF （Permute， Quantize， and Fine-tune）、基于低秩分解的方法TDNR （Tucker Decomposition with Nonlinear Response）和基于剪枝的方法CLIP-Q （Compression Learning by In-parallel Pruning-Quantization）进行比较与分析的实验结果表明，QTD能够在相同压缩范围下实现更好的分类准确率。

关键词: 卷积神经网络, 张量分解, 向量量化, 模型压缩, 图像分类

CLC Number:

TP183

Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization[J]. Journal of Computer Applications, 2024, 44(7): 1987-1994.

王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.

Figures/Tables 10

Fig. 1 Overall framework of proposed method

Tab. 1 Low-rank structures of different convolutional layers in ResNet

原始			分解后
卷积层	输入/输出通道数	卷积核大小	卷积层	输入/输出通道数	卷积核大小
conv1	64/64	3×3	conv1.1	64/32	1×1
			conv1.2	32/32	3×3
			conv1.3	32/64	1×1
conv2	64/128	3×3	conv2.1	64/32	1×1
			conv2.2	32/64	3×3
			conv2.3	64/128	1×1
conv3	128/256	3×3	conv3.1	128/64	1×1
			conv3.2	64/128	3×3
			conv3.3	128/256	1×1

Fig. 2 Parent-child relationship in low-rank structure of ResNet-18

Tab. 2 Compression parameter setting of ResNet

模型	压缩域	$m$	$d C$	$d F$	$d f c$
ResNet-18	小	1.6	$D 2$	2	2
	中	2.0	$D 2$	4	4
	大	4.0	$D 2$	4	4
ResNet-50	小	2.0	$D 2$	4	4
	中	4.0	$D 2$	8	4
	大	4.0	$2 D 2$	8	8

Tab. 2 Compression parameter setting of ResNet

模型	压缩域	$m$	$d C$	$d F$	$d f c$
ResNet-18	小	1.6	$D 2$	2	2
	中	2.0	$D 2$	4	4
	大	4.0	$D 2$	4	4
ResNet-50	小	2.0	$D 2$	4	4
	中	4.0	$D 2$	8	4
	大	4.0	$2 D 2$	8	8

Fig. 3 Experimental results of ResNet-18 on CIFAR-10

Fig. 4 Experimental results of ResNet-18 on CIFAR-100

Tab. 3 Comparison of FLOPs in compressed ResNet-18

层次	浮点运算量/MFLOPs
层次	PQF（压缩比为23.01）	QTD（压缩比为25.38）
合计	556.65	213.24
conv1	1.90	1.90
Res1	151.52	55.05
Res2	134.55	52.23
Res3	134.38	52.07
Res4	134.30	51.99
Linear	0.01	0.01

Tab. 4 Experimental results of ResNet-18 on ImageNet

方法	压缩比	准确率/%		压缩后准确率下降百分点
方法	压缩比	原始	压缩后	压缩后准确率下降百分点
BGD	25.12	69.1	67.39	1.71
PQF	25.33	69.1	67.88	1.22
QTD	24.58	69.1	65.27	3.83
ABC-Net	32.05	69.1	62.80	6.30
BWN	32.17	69.1	60.78	8.32
LR-Net	31.89	69.1	59.90	9.20
BGD	35.21	69.1	64.12	4.98
PQF	35.14	69.1	65.23	3.87
QTD	30.91	69.1	64.94	4.16
BGD	43.23	69.1	61.17	7.93
PQF	43.22	69.1	63.33	5.77
QTD	43.61	69.1	61.22	7.88
PQF	56.74	69.1	59.87	9.23
QTD	58.13	69.1	60.71	8.39
PQF	59.53	69.1	58.92	10.18
QTD	60.67	69.1	59.33	9.77

Tab. 5 Experimental results of ResNet-50 on ImageNet

方法	压缩比	准确率/%		压缩后准确率下降百分点
方法	压缩比	原始	压缩后	压缩后准确率下降百分点
CLIP-Q	14.90	76.15	73.77	2.38
HAQ	15.20	76.15	70.63	5.52
DC	15.18	76.15	68.90	7.25
BGD	15.20	76.15	74.81	1.34
PQF	16.82	76.15	75.42	0.73
QTD	16.48	76.15	75.26	0.89
BGD	25.91	76.15	71.53	4.62
PQF	25.93	76.15	73.64	2.51
QTD	23.54	76.15	71.75	4.40
PQF	29.37	76.15	70.22	5.93
QTD	30.16	76.15	70.74	5.41
PQF	32.16	76.15	69.13	7.02
QTD	33.57	76.15	69.85	6.30

Tab. 6 Ablation study results

实验序号	方法	压缩比	压缩后准确率/%	压缩后准确率下降百分点
1	全局TKD	28.69	90.24	4.85
2	VQ	28.48	92.35	2.74
3	QTD w/o排列	29.58	93.29	1.80
4	QTD	29.58	94.02	1.07

References 41

1	LeCUN Y， BENGIO Y， HINTON G. Deep learning ［J］. Nature， 2015， 521（7553）： 436-444.
2	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2012： 1097-1105.
3	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2014-09-04）［2023-07-01］. .
4	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
5	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
6	MIKOLOV T， CHEN K， CORADO G， et al. Efficient estimation of word representations in vector space ［EB/OL］. （2013-01-16）［2023-07-01］. .
7	DEVLIN J， CHANG M-W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［EB/OL］. （2018-10-11）［2023-07-01］. .
8	雷杰，高鑫，宋杰，等.深度网络模型压缩综述［J］.软件学报， 2018， 29（2）： 251-266.
	LEI J， GAO X， SONG J， et al. Survey of deep neural network model compression ［J］. Journal of Software， 2018， 29（2）： 251-266.
9	DENIL M， SHAKIBI B， DINH L， et al. Predicting parameters in deep learning ［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2013： 2148-2156.
10	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting ［J］. The Journal of Machine Learning Research， 2014， 15： 1929-1958.
11	KIM Y-D， PARK E， YOO S， et al. Compression of deep convolutional neural networks for fast and low power mobile applications ［EB/OL］. （2015-11-20）［2023-07-01］. .
12	LEBEDEV V， GANIN Y， RAKHUBA M， et al. Speeding-up convolutional neural networks using fine-tuned CP-decomposition ［EB/OL］. （2014-12-19）［2023-07-01］. .
13	YIN M， SUI Y， LIAO S， et al. Towards efficient tensor decomposition-based DNN model compression with optimization framework ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10674-10683.
14	STOCK P， JOULIN A， GRIBONVAL R， et al. And the bit goes down： revisiting the quantization of neural networks ［EB/OL］. （2019-07-12）［2023-07-01］. .
15	MARTINEZ J， SHEWAKRAMANI J， LIU TW， et al. Permute， quantize， and fine-tune： efficient compression of neural networks ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15694-15703.
16	LIU Y， NG M K. Deep neural network compression by Tucker decomposition with nonlinear response ［J］. Knowledge-Based Systems， 2022， 241： 108171.
17	PAN Y， XU J， WANG M， et al. Compressing recurrent neural networks with tensor ring for action recognition ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 4683-4690.
18	VANHOUCKE V， SENIOR A， MAO M Z. Improving the speed of neural networks on CPUs ［EB/OL］. （2022-02-15）［2023-07-01］. .
19	ZHU C， HAN S， MAO H， et al. Trained ternary quantization ［EB/OL］. （2016-12-04）［2023-07-01］. .
20	COURBARIAUX M， BENGIO Y， J-P DAVID. BinaryConnect： training deep neural networks with binary weights during propagations ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015， 2： 3123-3131.
21	WANG K， LIU Z， LIN Y， et al. HAQ： hardware-aware automated quantization with mixed precision ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8604-8612.
22	GONG Y， LIU L， YANG M， et al. Compressing deep convolutional networks using vector quantization ［EB/OL］. （2014-12-18）［2023-07-01］. .
23	WEN W， WU C， WANG Y， et al. Learning structured sparsity in deep neural networks ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 2082-2090.
24	ZHOU H， ALVAREZ J M， PORIKLI F. Less is more： towards compact CNNs ［C］// Proceedings of the 14th European Conference on Computer Vision.Cham： Springer， 2016： 662-677.
25	LI Y， GU S， MAYER C， et al. Group sparsity： the hinge between filter pruning and decomposition for network compression ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8015-8024.
26	王忠锋，徐志远，宋纯贺，等.基于梯度的深度网络剪枝算法［J］.计算机应用， 2020， 40（5）： 1253-1259.
	WANG Z F， XU Z Y， SONG C H， et al. Gradient-based deep network pruning algorithm ［J］. Journal of Computer Applications， 2020， 40（5）： 1253-1259.
27	巩凯强，张春梅，曾光华.卷积神经网络模型剪枝结合张量分解压缩方法［J］.计算机应用， 2020， 40（11）： 3146-3151.
	GONG K Q， ZHANG C M， ZENG G H. Convolution neural network model compression method based on pruning and tensor decomposition ［J］. Journal of Computer Applications， 2020， 40（11）： 3146-3151.
28	WU B， WAN A， LANDOLA F， et al. SqueezeDet： unified， small， low power fully convolutional neural networks for real-time object detection for autonomous driving ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2017： 446-454.
29	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. （2017-04-17）［2023-07-01］. .
30	SZEGEDY C， IOFFE S， VANHOUCKE V. Inception-v4， inception-ResNet and the impact of residual connections on learning ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2017， 31（1）： 4278-4284.
31	CHEN P， LIU S， ZHAO H， et al. Distilling knowledge via knowledge review ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5006-5015.
32	YANG X， YE J， WANG X. Factorizing knowledge in neural networks ［C］// Proceedings of the 17th European Conference on Computer Vision. Cham： Springer， 2022： 73-91.
33	LIN S， XIE H， WANG B， et al. Knowledge distillation via the target-aware transformer ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10905-10914.
34	KRIZHEVSKY A， NARI V， HINTON G. CIFAR-10 and CIFAR-100 Sdatasets ［DS/OL］. （2020-10-28）［2023-07-01］. .
35	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database ［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255.
36	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
37	LIN X， ZHAO C， PAN W. Towards accurate binary convolutional neural network ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 344-352.
38	HAN S， MAO H， DALLY W J. Deep compression： compressing deep neural networks with pruning， trained quantization and Huffman coding ［EB/OL］. （2015-10-01）［2023-07-01］. .
39	SHAYER O， LEVI D， FETAYA E. Learning discrete weights using the local reparameterization trick ［EB/OL］. （2017-10-21）［2023-07-01］. .
40	RASTEGARI M， ORDONEZ V， REDMON J， et al. XNOR-Net： ImageNet classification using binary convolutional neural networks ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 525-542.
41	TUNG F， MORI G. Deep neural network compression by in-parallel pruning-quantization ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（3）： 568-579.

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[3]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[4]	Feiyu ZHAI, Handa MA. Hybrid classical-quantum classification model based on DenseNet [J]. Journal of Computer Applications, 2024, 44(6): 1905-1910.
[5]	Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919.
[6]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[7]	Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478.
[8]	Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545.
[9]	Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113.
[10]	Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
[11]	Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet： MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302.
[12]	Bin XIAO, Mo YANG, Min WANG, Guangyuan QIN, Huan LI. Domain generalization method of phase-frequency fusion from independent perspective [J]. Journal of Computer Applications, 2024, 44(4): 1002-1009.
[13]	Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921.
[14]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[15]	Xue LI, Guangle YAO, Honghui WANG, Jun LI, Haoran ZHOU, Shaoze YE. Remote sensing image classification based on sample incremental learning [J]. Journal of Computer Applications, 2024, 44(3): 732-736.

Deep network compression method based on low-rank decomposition and vector quantization

基于低秩分解和向量量化的深度网络压缩方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 41

Related Articles 15

Recommended Articles

Metrics