基于FPGA的卷积神经网络定点加速

doi:10.11772/j.issn.1001-9081.2020020256

计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2811-2816.DOI: 10.11772/j.issn.1001-9081.2020020256

基于FPGA的卷积神经网络定点加速

雷小康^1,2, 尹志刚², 赵瑞莲¹

1. 北京化工大学信息科学与技术学院, 北京 100029;
2. 中国科学院自动化研究所, 北京 100190

收稿日期:2020-03-16 修回日期:2020-04-22 发布日期:2020-05-15 出版日期:2020-10-10
通讯作者: 尹志刚
作者简介:雷小康(1994-),男,河南周口人,硕士研究生,主要研究方向:深度学习、卷积神经网络模型压缩与加速;尹志刚(1976-),男,湖北天门人,研究员,博士,主要研究方向:人工智能、处理器芯片架构;赵瑞莲(1964-),女,山西忻州人,教授,博士,主要研究方向:软件测试、软件可靠性。
基金资助:
国家自然科学基金资助项目（61672085）。

FPGA-based convolutional neural network fixed-point acceleration

LEI Xiaokang^1,2, YIN Zhigang², ZHAO Ruilian¹

1. School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China;
2. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

Received:2020-03-16 Revised:2020-04-22 Online:2020-05-15 Published:2020-10-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61672085).

摘要/Abstract

摘要： 针对卷积神经网络（CNN）在资源受限的硬件设备上运行功耗高及运行慢的问题，提出一种基于现场可编程门阵列（FPGA）的CNN定点计算加速方法。首先提出一种定点化方法，并且每层卷积设计不同的尺度参数，使用相对散度确定位宽的长度，以减小CNN参数的存储空间，而且研究不同量化区间对CNN精度的影响；其次，设计参数复用方法及流水线计算方法来加速卷积计算。为验证CNN定点化后的加速效果，采用了人脸和船舶两个数据集进行验证。结果表明，相较于传统的浮点卷积计算，所提方法在保证CNN精度损失很小的前提下，当权值参数和输入特征图参数量化到7-bit时，在人脸识别CNN模型上的压缩后的权重参数文件大小约为原来的22%，卷积计算加速比为18.69，同时使FPGA中的乘加器的利用率达94.5%。实验结果表明了该方法可以提高卷积计算速度，并且能够高效利用FPGA硬件资源。

关键词: 卷积神经网络, 定点量化, 现场可编程门阵列, 模型压缩, YOLO模型

Abstract: Aiming at the problem of high running power consumption and slow operation of Convolutional Neural Network (CNN) on resource-constrained hardware devices, a method for accelerating fixed-point computation of CNN based on Field Programmable Gate Array (FPGA) was proposed. First, a fixed-point processing method was proposed. In order to reduce the storage space of the CNN parameters, different scale parameters were designed for different convolution layers and the relative divergence was used to determine the bit width length. The effect of different quantization intervals on the accuracy of CNN was studied. Then, the parameter multiplexing method and the pipeline calculation method were designed to accelerate the convolution calculation. In order to verify the acceleration effect of CNN after fixed-point processing, two datasets of face and ship were used for verification. Compared with the traditional floating-point convolution computation, on the premise of ensuring that the accuracy loss of the CNN is small, when the weight parameters and the input feature map parameters are quantized to 7-bit, on the face recognition CNN model, the proposed method has the compressed weight parameter file size of about 22% of the origin, and the convolution calculation speedup is 18.69. At the same time, the method makes the utilization rate of the multiplier-accumulator in FPGA reach 94.5%. Experimental results show that the proposed method can improve the speed of convolution calculation, and efficiently use FPGA hardware resources.

Key words: Convolutional Neural Network (CNN), fixed-point quantization, Field Programmable Gate Array (FPGA), model compression, YOLO model

中图分类号:

TP391.4

雷小康, 尹志刚, 赵瑞莲. 基于FPGA的卷积神经网络定点加速[J]. 计算机应用, 2020, 40(10): 2811-2816.

LEI Xiaokang, YIN Zhigang, ZHAO Ruilian. FPGA-based convolutional neural network fixed-point acceleration[J]. Journal of Computer Applications, 2020, 40(10): 2811-2816.

参考文献

[1] LECUN Y, BOTTOU L, BENGIO P, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,1998,86(11):2278-2324.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red hook,NY:Curran Associates Inc.,2012:1097-1105.
[3] SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-01-20]. https://arxiv.org/pdf/1409.1556.pdf.
[4] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1-9.
[5] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[6] DENIL M,SHAKIBI B,DINH L,et al. Predicting parameters in deep learning[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2013:2148-2156.
[7] SAINATH T N,KINGSBURY B,SINDHWANI V,et al. Low-rank matrix factorization for deep neural network training with highdimensional output targets[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE,2013:6655-6659.
[8] HAN S,MAO H,DALLY W J. Deep compression:compressing deep neural networks with pruning, trained quantization and Huffman coding[EB/OL].[2019-05-20]. https://arxiv.org/pdf/1510.00149.pdf.
[9] HAN S,LIU X,MAO H,et al. EIE:efficient inference engine on compressed deep neural network[C]//Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Piscataway:IEEE,2016:243-254.
[10] HARTIGAN J A,WONG M A. A K-means clustering algorithm[J]. Journal of the Royal Statistical Society,Series C(Applied Statistics),1979,28(1):100-108.
[11] IANDOLA FORREST, HAN S, MOSKEWICZ M W, et al. SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size[EB/OL].[2019-05-20]. https://arxiv.org/pdf/1602.07360.pdf.
[12] GYSEL P M. Ristretto:hardware-oriented approximation of convolutional neural networks[EB/OL].[2019-05-20]. https://arxiv.org/pdf/1605.06402.pdf.
[13] RAJASEGARAN J,JAYASUNDARA V,JAYASEKARA S,et al. DeepCaps:going deeper with capsule networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2019:10717-10725.
[14] ZHAO R,SONG W,ZHANG W,et al. Accelerating binarized convolutional neural networks with software-programmable FPGAs[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM,2017:15-24.
[15] WEI X,YU C H,ZHANG P,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs[C]//Proceedings of the 54th Annual Design Automation Conference. New York:ACM,2017:No. 29.
[16] AIMAR A,MOSTAFA H,CALABRESE E,et al. NullHop:a flexible convolutional neural network accelerator based on sparse representations of feature maps[J]. IEEE Transactions on Neural Networks,2019,30(3):644-656.
[17] REDMON J, FARHADI A. YOLOv3:an incremental improvement[EB/OL].[2019-04-08]. https://arxiv.org/pdf/1804.02767.pdf.
[18] IOFFE S,SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. New York:JMLR. org,2015:448-456.
[19] 施一飞. 对使用TensorRT加速AI深度学习推断效率的探索[J]. 科技视界,2017(31):26-27.(SHI Y F. Exploring the use of TensorRT to accelerate AI deep learning inference efficiency[J]. Science and Technology Vision,2017(31):26-27.)
[20] 余子健, 马德, 严晓浪, 等. 基于FPGA的卷积神经网络加速器[J]. 计算机工程,2017,43(1):109-114,119.(YU Z J,MA D,YAN X L,et al. FPGA-based accelerator for convolutional neural network[J]. Computer Engineering,2017,43(1):109-114,119.)
[21] 魏浚峰, 王东, 山丹. 基于FPGA的卷积神经网络加速器设计与实现[J]. 中国集成电路,2019,28(7):18-22,67.(WEI J F, WANG D,SHAN D. Design and implementation of convolutional neural network accelerator based on FPGA[J]. China Integrated Circuit,2019,28(7):18-22,67.)

编辑推荐 0

Metrics

阅读次数

全文

357

HTTP404 无法找到页面

http404错误

没有找到您要访问的页面，请检查您是否输入正确url。

请尝试以下操作：

·如果您已经在地址栏中输入该网页的地址，请确认其拼写正确。
·打开主页，然后查找指向您感兴趣信息的链接。
·单击后退链接，尝试其他链接。

HTTP404 无法找到页面

http404错误

没有找到您要访问的页面，请检查您是否输入正确url。

请尝试以下操作：

·如果您已经在地址栏中输入该网页的地址，请确认其拼写正确。
·打开主页，然后查找指向您感兴趣信息的链接。
·单击后退链接，尝试其他链接。

摘要

474

HTTP404 无法找到页面

http404错误

没有找到您要访问的页面，请检查您是否输入正确url。

请尝试以下操作：

·如果您已经在地址栏中输入该网页的地址，请确认其拼写正确。
·打开主页，然后查找指向您感兴趣信息的链接。
·单击后退链接，尝试其他链接。

HTTP404 无法找到页面

http404错误

没有找到您要访问的页面，请检查您是否输入正确url。

请尝试以下操作：

·如果您已经在地址栏中输入该网页的地址，请确认其拼写正确。
·打开主页，然后查找指向您感兴趣信息的链接。
·单击后退链接，尝试其他链接。

基于FPGA的卷积神经网络定点加速

FPGA-based convolutional neural network fixed-point acceleration

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐 0

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[4]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[7]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[8]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[9]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[10]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[11]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[12]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[13]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[14]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[15]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.