计算机应用 ›› 2018, Vol. 38 ›› Issue (9): 2449-2454.DOI: 10.11772/j.issn.1001-9081.2018020477

• 人工智能 •    下一篇

面向“边缘”应用的卷积神经网络量化与压缩方法

蔡瑞初1, 钟椿荣1, 余洋1, 陈炳丰1, 卢冶2, 陈瑶1,3   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 南开大学 计算机与控制工程学院, 天津 300353;
    3. 新加坡高等数字科学中心, 新加坡 138602
  • 收稿日期:2018-03-12 修回日期:2018-04-17 出版日期:2018-09-10 发布日期:2018-09-06
  • 通讯作者: 钟椿荣
  • 作者简介:蔡瑞初(1987—),男,浙江温州人,教授,博士,主要研究方向:深度学习、因果关系;钟椿荣(1993—),男,广东广州人,硕士研究生,主要研究方向:神经网络、深度学习优化;余洋(1993—),男,湖南常德人,硕士研究生,主要研究方向:神经网络、深度学习优化;陈炳丰(1983—),男,广东汕头人,博士研究生,主要研究方向:舆情分析、数据挖掘;卢冶(1986—),男,吉林吉林人,助理教授,博士,主要研究方向:嵌入式系统、深度学习;陈瑶(1987—),男,吉林松原人,博士,主要研究方向:边缘计算、深度学习。
  • 基金资助:
    NSFC-广东联合基金资助项目(U1501254);广东省杰出青年科学基金资助项目(2014A030306004);福建省信息处理与智能控制重点实验室开放课题(MJUKF201733)。

CNN quantization and compression strategy for edge computing applications

CAI Ruichu1, ZHONG Chunrong1, YU Yang1, CHEN Bingfeng1, LU Ye2, CHEN Yao1,3   

  1. 1. College of Computer Science, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    2. College of Computer and Control Engineering, Nankai University, Tianjin 300353, China;
    3. Advanced Digital Sciences Center, Singapore 138602, Singapore
  • Received:2018-03-12 Revised:2018-04-17 Online:2018-09-10 Published:2018-09-06
  • Contact: 钟椿荣
  • Supported by:
    This work is partially supported by the NSFC-Guangdong Joint Foundation (U1501254), the Guangdong Science Fund for Distinguished Young Scholars (2014A030306004), the Open Program of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (MJUKF201733).

摘要: 针对卷积神经网络(CNN)推理计算所需内存空间和资源过大,限制了其在嵌入式等"边缘"设备上部署的问题,提出结合网络权重裁剪及面向嵌入式硬件平台数据类型的数据量化的神经网络压缩方法。首先,根据卷积神经网络各层权重的分布,采用阈值法对网络精确率影响较小的权重进行裁剪,保留网络中重要连接的同时除去冗余信息;其次,针对嵌入式平台的计算特性分析网络中权重及激活函数所需的数据位宽,采用动态定点量化方法减小权重数据的位宽;最后,对网络进行微调,在保障网络模型识别精度的前提下进一步压缩模型大小并降低计算消耗。实验结果表明,该方法降低了VGG-19网络95.4%的存储空间而精确率仅降低0.3个百分点,几乎实现无损压缩;同时,通过多个网络模型的验证,该方法在平均1.46个百分点精确率变化范围内,最大降低网络模型96.12%的存储空间,能够有效地压缩卷积神经网络。

关键词: 卷积神经网络, 边缘计算, 网络裁剪, 数据量化, 网络压缩

Abstract: Focused on the problem that the memory and computational resource intensive nature of Convolutional Neural Network (CNN) limits the adoption of CNN on embedded devices such as edge computing, a convolutional neural network compression method combining network weight pruning and data quantization for embedded hardware platform data types was proposed. Firstly, according to the weights distribution of each layer of the original CNN, a threshold based pruning method was illustrated to eliminate the weights that have less impact on the network processing accuracy. The redundant information in the network model was removed while the important connections were preserved. Secondly, the required bit-width of the weights and activation functions were analyzed based on the computational characteristics of the embedded platform, and the dynamic fixed-point quantization method was employed to reduce the bit-width of the network model. Finally, the network was fine-tuned to further compress the model size and reduce the computational consumption while ensuring the accuracy of model inference. The experimental results show that this method reduces the network storage space of VGG-19 by over 22 times while reducing the accuracy by only 0.3%, which achieves almost lossless compression. Meanwhile, by evaluating on multiple models, this method can reduce the storage space of the network model by a maximum of 25 times within the range of average accuracy lose of 1.46%, which proves the effective compression of the proposed method.

Key words: Convolution Neural Network (CNN), edge computing, network pruning, quantization, network compressing

中图分类号: