面向“边缘”应用的卷积神经网络量化与压缩方法

doi:10.11772/j.issn.1001-9081.2018020477

计算机应用 ›› 2018, Vol. 38 ›› Issue (9): 2449-2454.DOI: 10.11772/j.issn.1001-9081.2018020477

• 人工智能 • 下一篇

面向“边缘”应用的卷积神经网络量化与压缩方法

蔡瑞初¹, 钟椿荣¹, 余洋¹, 陈炳丰¹, 卢冶², 陈瑶^1,3

1. 广东工业大学计算机学院, 广州 510006;
2. 南开大学计算机与控制工程学院, 天津 300353;
3. 新加坡高等数字科学中心, 新加坡 138602

收稿日期:2018-03-12 修回日期:2018-04-17 出版日期:2018-09-10 发布日期:2018-09-06
通讯作者: 钟椿荣
作者简介:蔡瑞初(1987—),男,浙江温州人,教授,博士,主要研究方向:深度学习、因果关系;钟椿荣(1993—),男,广东广州人,硕士研究生,主要研究方向:神经网络、深度学习优化;余洋(1993—),男,湖南常德人,硕士研究生,主要研究方向:神经网络、深度学习优化;陈炳丰(1983—),男,广东汕头人,博士研究生,主要研究方向:舆情分析、数据挖掘;卢冶(1986—),男,吉林吉林人,助理教授,博士,主要研究方向:嵌入式系统、深度学习;陈瑶(1987—),男,吉林松原人,博士,主要研究方向:边缘计算、深度学习。
基金资助:
NSFC-广东联合基金资助项目（U1501254）；广东省杰出青年科学基金资助项目（2014A030306004）；福建省信息处理与智能控制重点实验室开放课题（MJUKF201733）。

CNN quantization and compression strategy for edge computing applications

CAI Ruichu¹, ZHONG Chunrong¹, YU Yang¹, CHEN Bingfeng¹, LU Ye², CHEN Yao^1,3

1. College of Computer Science, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
2. College of Computer and Control Engineering, Nankai University, Tianjin 300353, China;
3. Advanced Digital Sciences Center, Singapore 138602, Singapore

Received:2018-03-12 Revised:2018-04-17 Online:2018-09-10 Published:2018-09-06
Contact: 钟椿荣
Supported by:
This work is partially supported by the NSFC-Guangdong Joint Foundation (U1501254), the Guangdong Science Fund for Distinguished Young Scholars (2014A030306004), the Open Program of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (MJUKF201733).

摘要/Abstract

摘要： 针对卷积神经网络（CNN）推理计算所需内存空间和资源过大，限制了其在嵌入式等"边缘"设备上部署的问题，提出结合网络权重裁剪及面向嵌入式硬件平台数据类型的数据量化的神经网络压缩方法。首先，根据卷积神经网络各层权重的分布，采用阈值法对网络精确率影响较小的权重进行裁剪，保留网络中重要连接的同时除去冗余信息；其次，针对嵌入式平台的计算特性分析网络中权重及激活函数所需的数据位宽，采用动态定点量化方法减小权重数据的位宽；最后，对网络进行微调，在保障网络模型识别精度的前提下进一步压缩模型大小并降低计算消耗。实验结果表明，该方法降低了VGG-19网络95.4%的存储空间而精确率仅降低0.3个百分点，几乎实现无损压缩；同时，通过多个网络模型的验证，该方法在平均1.46个百分点精确率变化范围内，最大降低网络模型96.12%的存储空间，能够有效地压缩卷积神经网络。

关键词: 卷积神经网络, 边缘计算, 网络裁剪, 数据量化, 网络压缩

Abstract: Focused on the problem that the memory and computational resource intensive nature of Convolutional Neural Network (CNN) limits the adoption of CNN on embedded devices such as edge computing, a convolutional neural network compression method combining network weight pruning and data quantization for embedded hardware platform data types was proposed. Firstly, according to the weights distribution of each layer of the original CNN, a threshold based pruning method was illustrated to eliminate the weights that have less impact on the network processing accuracy. The redundant information in the network model was removed while the important connections were preserved. Secondly, the required bit-width of the weights and activation functions were analyzed based on the computational characteristics of the embedded platform, and the dynamic fixed-point quantization method was employed to reduce the bit-width of the network model. Finally, the network was fine-tuned to further compress the model size and reduce the computational consumption while ensuring the accuracy of model inference. The experimental results show that this method reduces the network storage space of VGG-19 by over 22 times while reducing the accuracy by only 0.3%, which achieves almost lossless compression. Meanwhile, by evaluating on multiple models, this method can reduce the storage space of the network model by a maximum of 25 times within the range of average accuracy lose of 1.46%, which proves the effective compression of the proposed method.

Key words: Convolution Neural Network (CNN), edge computing, network pruning, quantization, network compressing

中图分类号:

TP183

蔡瑞初, 钟椿荣, 余洋, 陈炳丰, 卢冶, 陈瑶. 面向“边缘”应用的卷积神经网络量化与压缩方法[J]. 计算机应用, 2018, 38(9): 2449-2454.

CAI Ruichu, ZHONG Chunrong, YU Yang, CHEN Bingfeng, LU Ye, CHEN Yao. CNN quantization and compression strategy for edge computing applications[J]. Journal of Computer Applications, 2018, 38(9): 2449-2454.

参考文献

[1] Datafloq. Self-driving cars will create 2 petabytes of data, what are the big data opportunities for the car industry?[EB/OL].[2016-12-03]. https://datafloq.com/read/self-driving-cars-create-2-petabytes-data-annually/172.
[2] LeCUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[3] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//NIPS'12:Proceedings of the 25th International Conference on Neural Information Processing Systems. North Miami Beach, FL, USA:Curran Associates, 2012:1097-1105.
[4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-01-06]. http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Simonyan15/simonyan15.pdf.
[5] COATES A, HUVAL B, WANG T, et al. Deep learning with COTS HPC systems[C]//ICML'13:Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, GA:JMLR, 2013, 28:Ⅲ-1337-Ⅲ-1345.
[6] DENIL M, SHAKIBI B, DINH L, et al. Predicting parameters in deep learning[C]//NIPS'13:Proceedings of the 2013 International Conference on Neural Information Processing Systems. North Miami Beach, FL, USA:Curran Associates, 2013:2148-2156.
[7] LeCUN Y, DENKER J S, SOLLA S A. Optimal brain damage[C]//NIPS'89:Proceedings of the 2nd International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 1989:598-605.
[8] HASSIBI B, STORK D G. Second order derivatives for network pruning:optimal brain surgeon[C]//NIPS'93:Proceedings of the 1993 Advances in Neural Information Processing Systems. San Francisco, CA:Morgan Kaufmann Publishers, 1993:164-171.
[9] MOLCHANOV P, TYREE S, KARRAS T, et al. Pruning convolutional neural networks for resource efficient inference[EB/OL].[2018-01-08]. https://users.aalto.fi/~ailat1/publications/molchanov2017iclr_paper.pdf.
[10] HAN S, MAO H, DALLY W J. Deep compression:compressing deep neural networks with pruning, trained quantization and Huffman coding[J]. Fiber, 2015, 56(4):3-7.
[11] VANHOUCKE V, SENIOR A, MAO M Z. Improving the speed of neural networks on CPUs[EB/OL].[2018-01-08]. http://www.audentia-gestion.fr/Recherche-Research-Google/37631.pdf.
[12] HWANG K, SUNG W. Fixed-point feedforward deep neural network design using weights +1, 0, and -1[C]//Proceedings of the 2014 IEEE Workshop on Signal Processing Systems. Piscataway, NJ:IEEE, 2014:1-6.
[13] GONG Y, LIU L, YANG M, et al. Compressing deep convolutional networks using vector quantization[EB/OL].[2018-01-08]. http://pdfs.semanticscholar.org/e7bf/9803705f2eb608db1e59e5c7636a3f171916.pdf.
[14] CHEN W, WILSON J T, TYREE S, et al. Compressing neural networks with the hashing trick[EB/OL].[2018-01-08]. https://www.cse.wustl.edu/~ychen/public/ICML15.pdf.
[15] COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect:training deep neural networks with binary weights during propagations[C]//NIPS'15:Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2015:3123-3131.
[16] HAN S, POOL J, TRAN J, et al. Learning both weights and connections for efficient neural networks[C]//NIPS'15:Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2015, 1:1135-1143.
[17] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe:convolutional architecture for fast feature embedding[C]//CIKM'14:Proceedings of the 201422nd ACM International Conference on Multimedia. New York:ACM, 2014:675-678.
[18] GYSEL P, MOTAMEDI M, GHIASI S. Hardware-oriented approximation of convolutional neural networks[EB/OL].[2018-01-11]. https://arxiv.org/pdf/1604.03168v2.pdf.
[19] HAMMERSTROM D. A VLSI architecture for high-performance, low-cost, on-chip learning[C]//IJCNN'90:Proceedings of the 1990 International Joint Conference on Neural Networks. Piscataway, NJ:IEEE, 1990:537-544.
[20] WILLIAMSON D. Dynamically scaled fixed point arithmetic[C]//PACRIM'91:Proceedings of the 1991 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing. Piscataway, NJ:IEEE, 1991:315-318.
[21] GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[EB/OL].[2018-01-11]. http://www.utc.fr/~bordesan/dokuwiki/_media/en/glorot10nipsworkshop.pdf.
[22] COURBARIAUX M, BENGIO Y, DAVID J P. Training deep neural networks with low precision multiplications[EB/OL].[2018-01-11]. http://xueshu.baidu.com/s?wd=paperuri%3A%2863058c088857b36e18a39426c453de17%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Farxiv.org%2Fpdf%2F1412.7024.pdf&ie=utf-8&sc_us=6828366373690791903.
[23] GUPTA S, AGRAWAL A, GOPALAKRISHNAN K, et al. Deep learning with limited numerical precision[EB/OL].[2018-01-11]. http://proceedings.mlr.press/v37/gupta15.pdf.
[24] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL].[2018-01-11]. http://xueshu.baidu.com/s?wd=paperuri%3A%288e9f88ec46614851387705d9ecf44163%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Farxiv.org%2Fabs%2F1602.07360v3&ie=utf-8&sc_us=2842338737931550299.
[25] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[EB/OL].[2018-01-14]. http://www.robots.ox.ac.uk/~vgg/rg/papers/deepres.pdf.
[26] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.

面向“边缘”应用的卷积神经网络量化与压缩方法

CNN quantization and compression strategy for edge computing applications

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[2]	李卓, 宋子晖, 沈鑫, 陈昕. 边缘计算支持下的移动群智感知本地差分隐私保护机制[J]. 计算机应用, 2021, 41(9): 2678-2686.
[3]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[4]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[5]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[6]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[7]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[8]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[9]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[10]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[11]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[12]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[13]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[14]	谭道强, 曾诚, 乔金霞, 张俊. 基于混合注意力模型的阴影检测方法[J]. 计算机应用, 2021, 41(7): 2076-2081.
[15]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.