基于二维Winograd算法的深流水线5×5卷积方法

doi:10.11772/j.issn.1001-9081.2020101668

计算机应用 ›› 2021, Vol. 41 ›› Issue (8): 2258-2264.DOI: 10.11772/j.issn.1001-9081.2020101668

所属专题：先进计算

基于二维Winograd算法的深流水线5×5卷积方法

黄程程, 董霄霄, 李钊

山东理工大学计算机科学与技术学院, 山东淄博 255049

收稿日期:2020-10-28 修回日期:2021-01-27 发布日期:2021-02-05 出版日期:2021-08-10
通讯作者: 李钊
作者简介:黄程程(1997-),男,四川仁寿人,硕士研究生,CCF会员,主要研究方向:计算机体系架构;董霄霄(1996-),女,山东济宁人,硕士研究生,主要研究方向:目标检测;李钊(1983-),男,山东淄博人,讲师,博士,主要研究方向:近似计算、机器学习。
基金资助:
山东省自然科学基金资助项目（ZR2018LF002）；山东省高等学校青年创新团队发展计划项目（2019KJN048）；淄博市校城融合发展计划项目（2018ZBXC021）。

Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

HUANG Chengcheng, DONG Xiaoxiao, LI Zhao

School of Computer Science and Technology, Shandong University of Technology, Zibo Shandong 255049, China

Received:2020-10-28 Revised:2021-01-27 Online:2021-02-05 Published:2021-08-10
Supported by:
This work is partially supported by the Natural Science Foundation of Shandong Province (ZR2018LF002), the Development Program of Youth Innovation Teams in Colleges and Universities of Shandong Province (2019KJN048), the University and City Integration Development Program of Zibo City (2018ZBXC021).

摘要/Abstract

摘要： 针对二维Winograd卷积算法中存储器带宽需求过高、计算复杂度高、设计探索周期漫长、级联的卷积存在层间计算延迟等问题，提出一种基于二维Winograd算法的双缓冲区5×5卷积层设计方法。首先使用列缓冲结构完成数据布局，以重用相邻分块之间的重叠数据，降低存储器带宽需求；然后精确搜索并复用Winograd算法加法计算过程中重复的中间计算结果，来降低加法运算量，从而减小加速器系统的能耗开销和设计面积；最后根据Winograd算法计算过程来完成6级流水线结构的设计，并实现针对5×5卷积的高效率计算。实验结果表明，这种5×5卷积的计算方法在基本不影响卷积神经网络（CNN）预测准确率的前提下，与传统卷积相比降低了83%的乘法运算量，加速倍率为5.82；该方法与级联3×3二维Winograd卷积组成5×5卷积的方法相比降低了12%的乘法运算量，降低了约24.2%的存储器带宽需求，并减少了20%的运算时间。

关键词: 卷积神经网络, 现场可编程逻辑门阵列, Winograd算法, 双缓冲区, 深流水线

Abstract: Aiming at problems such as high memory bandwidth demand, high computational complexity, long design and exploration cycle, and inter-layer computing delay of cascade convolution in two-dimensional Winograd convolution algorithm, a double-buffer 5×5 convolutional layer design method based on two-dimensional Winograd algorithm was proposed. Firstly, the column buffer structure was used to complete the data layout, so as to reuse the overlapping data between adjacent blocks and reduce the memory bandwidth demand. Then, the repeated intermediate calculation results in addition process of Winograd algorithm were precisely searched and reused to reduce the computational cost of addition, so that the energy consumption and the design area of the accelerator system were decreased. Finally, according to the calculation process of Winograd algorithm, the design of 6-stage pipeline structure was completed, and the efficient calculation for 5×5 convolution was realized. Experimental results show that, on the premise that the prediction accuracy of the Convolutional Neural Network (CNN) is basically not affected, this calculation method of 5×5 convolution reduces the multiplication computational cost by 83% compared to the traditional convolution, and has the acceleration ratio of 5.82; compared with the method of cascading 3×3 two-dimensional Winograd convolutions to generate 5×5 convolutions, the proposed method has the multiplication computational cost reduced by 12%, the memory bandwidth demand decreased by about 24.2%, and the computing time reduced by 20%.

Key words: Convolutional Neural Network (CNN), Field Programmable Gate Array (FPGA), Winograd algorithm, double-buffer, deep pipeline

中图分类号:

TP302.1

黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.

HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm[J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.

参考文献

[1] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1-9.
[2] PHAM N Q,NGUYEN T S,NIEHUES J,et al. Very deep selfattention networks for end-to-end speech recognition[EB/OL].[2020-09-26]. https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2702.pdf.
[3] LI X H,LAI T T,WANG S Y,et al. Weighted feature pyramid networks for object detection[C]//Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking. Piscataway:IEEE,2019:1500-1504.
[4] 冯文博, 洪征, 吴礼发, 等. 基于卷积神经网络的应用层协议识别方法[J]. 计算机应用,2019,39(12):3615-3621.(FENG W B,HONG Z,WU L F,et al. Application protocol recognition method based on convolutional neural network[J]. Journal of Computer Applications,2019,39(12):3615-3621.)
[5] 刘尚旺, 刘承伟, 张爱丽. 基于深度可分卷积神经网络的实时人脸表情和性别分类[J]. 计算机应用,2020,40(4):990-995. (LIU S W,LIU C W,ZHANG A L. Real-time facial expression and gender recognition based on depthwise separable convolutional neural network[J]. Journal of Computer Applications,2020,40(4):990-995.)
[6] 刘伟波, 曾庆宁, 卜玉婷, 等. 基于双微阵列与卷积神经网络的语音识别方法[J]. 计算机应用,2019,39(11):3268-3273.(LIU W B,ZENG Q N,BU Y T,et al. Speech recognition method based on dual micro-array and convolutional neural network[J]. Journal of Computer Applications,2019,39(11):3268-3273.)
[7] YIN Q, LI Y F, HUANG H Z, et al. FPGA-based highperformance CNN accelerator architecture with high DSP utilization and efficient scheduling mode[C]//Proceedings of the 2020 International Conference on High Performance Big Data and Intelligent Systems. Piscataway:IEEE,2020:1-7.
[8] SHEN J Z,HUANG Y,WANG Z L,et al. Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM, 2018:97-106.
[9] RAJAT R, ZENG H Q, PRASANNA V. A flexible design automation tool for accelerating quantized spectral CNNs[C]//Proceedings of 29th International Conference on FieldProgrammable Logic and Applications. Piscataway:IEEE,2019:144-150.
[10] LIANG Y,LU L Q,XIAO Q C,et al. Evaluating fast algorithms for convolutional neural networks on FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2020,39(4):857-870.
[11] PODILI A, ZHANG C, PRASANNA V. Fast and efficient implementation of convolutional neural networks on FPGA[C]//Proceedings of the IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors. Piscataway:IEEE,2017:11-18.
[12] SHEN J Z,HUANG Y,WEN M,et al. Towards an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA[J]. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems,2020,39(7):1442-1455.
[13] ZHANG C,LI P,SUN G Y,et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM, 2015:161-170.
[14] ZHU C Y,HUANG K J,YANG S Y,et al. An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs[J]. IEEE Transactions on Very Large Scale Integration (VLSI)Systems,2020,28(9):1953-1965.
[15] ZHANG C,PRASANNA V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system[C]//Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM,2017:35-44.
[16] GUO K Y,SUI L Z,QIU J T,et al. Angel-Eye:a complete design flow for mapping CNN onto embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,37(1):35-47.
[17] WINOGRAD S. Arithmetic Complexity of Computations[M]. Philadelphia, PA:Society for Industrial and Applied Mathematics,1980:18-23.

基于二维Winograd算法的深流水线5×5卷积方法

Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[4]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[5]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[6]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[7]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[8]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[9]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[10]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[11]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[12]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.
[13]	席治远, 唐超, 童安炀, 王文剑. 基于双路时空网络的驾驶员行为识别[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1511-1519.
[14]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[15]	王杰, 孟华. 基于点云整体拓扑结构的图像分类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1107-1113.