基于低秩分解和向量量化的深度网络压缩方法

doi:10.11772/j.issn.1001-9081.2023071027

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (7): 1987-1994.DOI: 10.11772/j.issn.1001-9081.2023071027

• 人工智能 • 下一篇

基于低秩分解和向量量化的深度网络压缩方法

王东炜¹^,²^,³, 刘柏辰¹^,²^,³, 韩志¹^,²(), 王艳美¹^,²^,³, 唐延东¹^,²

^1.机器人学国家重点实验室(中国科学院沈阳自动化研究所), 沈阳 110016
^2.中国科学院机器人与智能制造研究院, 沈阳 110016
^3.中国科学院大学, 北京 100049

收稿日期:2023-07-30 修回日期:2023-09-18 接受日期:2023-09-21 发布日期:2023-10-26 出版日期:2024-07-10
通讯作者: 韩志
作者简介:王东炜（1999—），男，河北唐山人，硕士研究生，主要研究方向：深度网络压缩、知识迁移；
刘柏辰（1994—），男，吉林吉林人，博士研究生，主要研究方向：深度学习、深度网络压缩；
王艳美（1996—），女，山东威海人，博士研究生，主要研究方向：迁移学习、域泛化；
唐延东（1962—），男，山东聊城人，研究员，博士，主要研究方向：机器人视觉、图像处理、模式识别。
第一联系人：韩志（1983—），男，辽宁沈阳人，研究员，博士，主要研究方向：计算机视觉、矩阵恢复、深度学习；
基金资助:
国家重点研发计划项目(2020YFB1313400)

Deep network compression method based on low-rank decomposition and vector quantization

Dongwei WANG¹^,²^,³, Baichen LIU¹^,²^,³, Zhi HAN¹^,²(), Yanmei WANG¹^,²^,³, Yandong TANG¹^,²

^1.State Key Laboratory of Robotics （Shenyang Institute of Automation，Chinese Academy of Sciences），Shenyang Liaoning 110016，China
^2.Institutes for Robotics and Intelligent Manufacturing，Chinese Academy of Sciences，Shenyang Liaoning 110016，China
^3.University of Chinese Academy of Sciences，Beijing 100049，China

Received:2023-07-30 Revised:2023-09-18 Accepted:2023-09-21 Online:2023-10-26 Published:2024-07-10
Contact: Zhi HAN
About author:WANG Dongwei， born in 1999， M. S. candidate. His research interests include deep network compression， knowledge transfer.
LIU Baichen， born in 1994， Ph. D. candidate. His research interests include deep learning， deep network compression.
WANG Yanmei， born in 1996， Ph. D. candidate. Her research interests include transfer learning， domain generalization.
TANG Yandong， born in 1962， Ph. D.， research fellow. His research interests include robot vision， image processing， pattern recognition.
First author contact:HAN Zhi， born in 1983， Ph. D.， research fellow. His research interests include computer vision， matrix completion， deep learning.
Supported by:
National Key Research and Development Program(2020YFB1313400)

摘要/Abstract

摘要：

随着人工智能的发展，深度神经网络成为多种模式识别任务中必不可少的工具，由于深度卷积神经网络（CNN）参数量巨大、计算复杂度高，将它部署到计算资源和存储空间受限的边缘计算设备上成为一项挑战。因此，深度网络压缩成为近年来的研究热点。低秩分解与向量量化是深度网络压缩中重要的两个研究分支，其核心思想都是通过找到原网络结构的一种紧凑型表达，从而降低网络参数的冗余程度。通过建立联合压缩框架，提出一种基于低秩分解和向量量化的深度网络压缩方法——可量化的张量分解（QTD）。该方法能够在网络低秩结构的基础上实现进一步的量化，从而得到更大的压缩比。在CIFAR-10数据集上对经典ResNet和该方法进行验证的实验结果表明，QTD能够在准确率仅损失1.71个百分点的情况下，将网络参数量压缩至原来的1%。而在大型数据集ImageNet上把所提方法与基于量化的方法PQF （Permute， Quantize， and Fine-tune）、基于低秩分解的方法TDNR （Tucker Decomposition with Nonlinear Response）和基于剪枝的方法CLIP-Q （Compression Learning by In-parallel Pruning-Quantization）进行比较与分析的实验结果表明，QTD能够在相同压缩范围下实现更好的分类准确率。

关键词: 卷积神经网络, 张量分解, 向量量化, 模型压缩, 图像分类

Abstract:

As the development of artificial intelligence， deep neural network has become an essential tool in various pattern recognition tasks. Deploying deep Convolutional Neural Networks （CNN） on edge computing equipment is challenging due to storage space and computing resource constraints. Therefore， deep network compression has become an important research topic in recent years. Low-rank decomposition and vector quantization are the most popular network compression techniques， which both try to find a compact representation of the original network， thereby reducing the redundancy of network parameters. By establishing a joint compression framework， a deep network compression method based on low-rank decomposition and vector decomposition — Quantized Tensor Decomposition （QTD） was proposed to obtain higher compression ratio by performing further quantization based on the low-rank structure of network. Experimental results of classical ResNet and the proposed method on CIFAR-10 dataset show that the volume can be compressed to 1% by QTD with a slight accuracy drop of 1.71 percentage points. Moreover， the proposed method was compared with the quantization-based method PQF （Permute， Quantize， and Fine-tune）， the low-rank decomposition-based method TDNR （Tucker Decomposition with Nonlinear Response）， and the pruning-based method CLIP-Q （Compression Learning by In-parallel Pruning-Quantization） on large dataset ImageNet. Experimental results show that QTD can maintain better classification accuracy with same compression range.

Key words: Convolutional Neural Network (CNN), tensor decomposition, vector quantization, model compression, image classification

中图分类号:

TP183

王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 计算机应用, 2024, 44(7): 1987-1994.

Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization[J]. Journal of Computer Applications, 2024, 44(7): 1987-1994.

图/表 10

图1 本文方法的整体框架

Fig. 1 Overall framework of proposed method

表1 ResNet中不同卷积层的低秩结构

Tab. 1 Low-rank structures of different convolutional layers in ResNet

原始			分解后
卷积层	输入/输出通道数	卷积核大小	卷积层	输入/输出通道数	卷积核大小
conv1	64/64	3×3	conv1.1	64/32	1×1
			conv1.2	32/32	3×3
			conv1.3	32/64	1×1
conv2	64/128	3×3	conv2.1	64/32	1×1
			conv2.2	32/64	3×3
			conv2.3	64/128	1×1
conv3	128/256	3×3	conv3.1	128/64	1×1
			conv3.2	64/128	3×3
			conv3.3	128/256	1×1

图2 ResNet-18低秩结构中的父层-子层关系

Fig. 2 Parent-child relationship in low-rank structure of ResNet-18

表2 ResNet压缩参数设置

Tab. 2 Compression parameter setting of ResNet

模型	压缩域	$m$	$d C$	$d F$	$d f c$
ResNet-18	小	1.6	$D 2$	2	2
	中	2.0	$D 2$	4	4
	大	4.0	$D 2$	4	4
ResNet-50	小	2.0	$D 2$	4	4
	中	4.0	$D 2$	8	4
	大	4.0	$2 D 2$	8	8

表2 ResNet压缩参数设置

Tab. 2 Compression parameter setting of ResNet

模型	压缩域	$m$	$d C$	$d F$	$d f c$
ResNet-18	小	1.6	$D 2$	2	2
	中	2.0	$D 2$	4	4
	大	4.0	$D 2$	4	4
ResNet-50	小	2.0	$D 2$	4	4
	中	4.0	$D 2$	8	4
	大	4.0	$2 D 2$	8	8

图3 ResNet-18在CIFAR-10上的实验结果

Fig. 3 Experimental results of ResNet-18 on CIFAR-10

图4 ResNet-18在CIFAR-100上的实验结果

Fig. 4 Experimental results of ResNet-18 on CIFAR-100

表3 ResNet-18模型压缩后的浮点运算量比较

Tab. 3 Comparison of FLOPs in compressed ResNet-18

层次	浮点运算量/MFLOPs
层次	PQF（压缩比为23.01）	QTD（压缩比为25.38）
合计	556.65	213.24
conv1	1.90	1.90
Res1	151.52	55.05
Res2	134.55	52.23
Res3	134.38	52.07
Res4	134.30	51.99
Linear	0.01	0.01

表4 ResNet-18在ImageNet上的实验结果

Tab. 4 Experimental results of ResNet-18 on ImageNet

方法	压缩比	准确率/%		压缩后准确率下降百分点
方法	压缩比	原始	压缩后	压缩后准确率下降百分点
BGD	25.12	69.1	67.39	1.71
PQF	25.33	69.1	67.88	1.22
QTD	24.58	69.1	65.27	3.83
ABC-Net	32.05	69.1	62.80	6.30
BWN	32.17	69.1	60.78	8.32
LR-Net	31.89	69.1	59.90	9.20
BGD	35.21	69.1	64.12	4.98
PQF	35.14	69.1	65.23	3.87
QTD	30.91	69.1	64.94	4.16
BGD	43.23	69.1	61.17	7.93
PQF	43.22	69.1	63.33	5.77
QTD	43.61	69.1	61.22	7.88
PQF	56.74	69.1	59.87	9.23
QTD	58.13	69.1	60.71	8.39
PQF	59.53	69.1	58.92	10.18
QTD	60.67	69.1	59.33	9.77

表5 ResNet-50在ImageNet上的实验结果

Tab. 5 Experimental results of ResNet-50 on ImageNet

方法	压缩比	准确率/%		压缩后准确率下降百分点
方法	压缩比	原始	压缩后	压缩后准确率下降百分点
CLIP-Q	14.90	76.15	73.77	2.38
HAQ	15.20	76.15	70.63	5.52
DC	15.18	76.15	68.90	7.25
BGD	15.20	76.15	74.81	1.34
PQF	16.82	76.15	75.42	0.73
QTD	16.48	76.15	75.26	0.89
BGD	25.91	76.15	71.53	4.62
PQF	25.93	76.15	73.64	2.51
QTD	23.54	76.15	71.75	4.40
PQF	29.37	76.15	70.22	5.93
QTD	30.16	76.15	70.74	5.41
PQF	32.16	76.15	69.13	7.02
QTD	33.57	76.15	69.85	6.30

表6 消融实验结果

Tab. 6 Ablation study results

实验序号	方法	压缩比	压缩后准确率/%	压缩后准确率下降百分点
1	全局TKD	28.69	90.24	4.85
2	VQ	28.48	92.35	2.74
3	QTD w/o排列	29.58	93.29	1.80
4	QTD	29.58	94.02	1.07

参考文献 41

1	LeCUN Y， BENGIO Y， HINTON G. Deep learning ［J］. Nature， 2015， 521（7553）： 436-444.
2	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2012： 1097-1105.
3	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2014-09-04）［2023-07-01］. .
4	GIRSHICK R. Fast R-CNN ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
5	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
6	MIKOLOV T， CHEN K， CORADO G， et al. Efficient estimation of word representations in vector space ［EB/OL］. （2013-01-16）［2023-07-01］. .
7	DEVLIN J， CHANG M-W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［EB/OL］. （2018-10-11）［2023-07-01］. .
8	雷杰，高鑫，宋杰，等.深度网络模型压缩综述［J］.软件学报， 2018， 29（2）： 251-266.
	LEI J， GAO X， SONG J， et al. Survey of deep neural network model compression ［J］. Journal of Software， 2018， 29（2）： 251-266.
9	DENIL M， SHAKIBI B， DINH L， et al. Predicting parameters in deep learning ［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2013： 2148-2156.
10	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting ［J］. The Journal of Machine Learning Research， 2014， 15： 1929-1958.
11	KIM Y-D， PARK E， YOO S， et al. Compression of deep convolutional neural networks for fast and low power mobile applications ［EB/OL］. （2015-11-20）［2023-07-01］. .
12	LEBEDEV V， GANIN Y， RAKHUBA M， et al. Speeding-up convolutional neural networks using fine-tuned CP-decomposition ［EB/OL］. （2014-12-19）［2023-07-01］. .
13	YIN M， SUI Y， LIAO S， et al. Towards efficient tensor decomposition-based DNN model compression with optimization framework ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 10674-10683.
14	STOCK P， JOULIN A， GRIBONVAL R， et al. And the bit goes down： revisiting the quantization of neural networks ［EB/OL］. （2019-07-12）［2023-07-01］. .
15	MARTINEZ J， SHEWAKRAMANI J， LIU TW， et al. Permute， quantize， and fine-tune： efficient compression of neural networks ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 15694-15703.
16	LIU Y， NG M K. Deep neural network compression by Tucker decomposition with nonlinear response ［J］. Knowledge-Based Systems， 2022， 241： 108171.
17	PAN Y， XU J， WANG M， et al. Compressing recurrent neural networks with tensor ring for action recognition ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 4683-4690.
18	VANHOUCKE V， SENIOR A， MAO M Z. Improving the speed of neural networks on CPUs ［EB/OL］. （2022-02-15）［2023-07-01］. .
19	ZHU C， HAN S， MAO H， et al. Trained ternary quantization ［EB/OL］. （2016-12-04）［2023-07-01］. .
20	COURBARIAUX M， BENGIO Y， J-P DAVID. BinaryConnect： training deep neural networks with binary weights during propagations ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015， 2： 3123-3131.
21	WANG K， LIU Z， LIN Y， et al. HAQ： hardware-aware automated quantization with mixed precision ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8604-8612.
22	GONG Y， LIU L， YANG M， et al. Compressing deep convolutional networks using vector quantization ［EB/OL］. （2014-12-18）［2023-07-01］. .
23	WEN W， WU C， WANG Y， et al. Learning structured sparsity in deep neural networks ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 2082-2090.
24	ZHOU H， ALVAREZ J M， PORIKLI F. Less is more： towards compact CNNs ［C］// Proceedings of the 14th European Conference on Computer Vision.Cham： Springer， 2016： 662-677.
25	LI Y， GU S， MAYER C， et al. Group sparsity： the hinge between filter pruning and decomposition for network compression ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8015-8024.
26	王忠锋，徐志远，宋纯贺，等.基于梯度的深度网络剪枝算法［J］.计算机应用， 2020， 40（5）： 1253-1259.
	WANG Z F， XU Z Y， SONG C H， et al. Gradient-based deep network pruning algorithm ［J］. Journal of Computer Applications， 2020， 40（5）： 1253-1259.
27	巩凯强，张春梅，曾光华.卷积神经网络模型剪枝结合张量分解压缩方法［J］.计算机应用， 2020， 40（11）： 3146-3151.
	GONG K Q， ZHANG C M， ZENG G H. Convolution neural network model compression method based on pruning and tensor decomposition ［J］. Journal of Computer Applications， 2020， 40（11）： 3146-3151.
28	WU B， WAN A， LANDOLA F， et al. SqueezeDet： unified， small， low power fully convolutional neural networks for real-time object detection for autonomous driving ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2017： 446-454.
29	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. （2017-04-17）［2023-07-01］. .
30	SZEGEDY C， IOFFE S， VANHOUCKE V. Inception-v4， inception-ResNet and the impact of residual connections on learning ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2017， 31（1）： 4278-4284.
31	CHEN P， LIU S， ZHAO H， et al. Distilling knowledge via knowledge review ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 5006-5015.
32	YANG X， YE J， WANG X. Factorizing knowledge in neural networks ［C］// Proceedings of the 17th European Conference on Computer Vision. Cham： Springer， 2022： 73-91.
33	LIN S， XIE H， WANG B， et al. Knowledge distillation via the target-aware transformer ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10905-10914.
34	KRIZHEVSKY A， NARI V， HINTON G. CIFAR-10 and CIFAR-100 Sdatasets ［DS/OL］. （2020-10-28）［2023-07-01］. .
35	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database ［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255.
36	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
37	LIN X， ZHAO C， PAN W. Towards accurate binary convolutional neural network ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 344-352.
38	HAN S， MAO H， DALLY W J. Deep compression： compressing deep neural networks with pruning， trained quantization and Huffman coding ［EB/OL］. （2015-10-01）［2023-07-01］. .
39	SHAYER O， LEVI D， FETAYA E. Learning discrete weights using the local reparameterization trick ［EB/OL］. （2017-10-21）［2023-07-01］. .
40	RASTEGARI M， ORDONEZ V， REDMON J， et al. XNOR-Net： ImageNet classification using binary convolutional neural networks ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 525-542.
41	TUNG F， MORI G. Deep neural network compression by in-parallel pruning-quantization ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（3）： 568-579.

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[4]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[5]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[6]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[7]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[8]	翟飞宇, 马汉达. 基于DenseNet的经典-量子混合分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1905-1910.
[9]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[10]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[11]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[12]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[13]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[14]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[15]	陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1114-1120.

基于低秩分解和向量量化的深度网络压缩方法

Deep network compression method based on low-rank decomposition and vector quantization

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 41

相关文章 15

编辑推荐

Metrics