Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (8): 2258-2264.DOI: 10.11772/j.issn.1001-9081.2020101668

Special Issue: 先进计算

• Advanced computing • Previous Articles     Next Articles

Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

HUANG Chengcheng, DONG Xiaoxiao, LI Zhao   

  1. School of Computer Science and Technology, Shandong University of Technology, Zibo Shandong 255049, China
  • Received:2020-10-28 Revised:2021-01-27 Online:2021-08-10 Published:2021-02-05
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Shandong Province (ZR2018LF002), the Development Program of Youth Innovation Teams in Colleges and Universities of Shandong Province (2019KJN048), the University and City Integration Development Program of Zibo City (2018ZBXC021).

基于二维Winograd算法的深流水线5×5卷积方法

黄程程, 董霄霄, 李钊   

  1. 山东理工大学 计算机科学与技术学院, 山东 淄博 255049
  • 通讯作者: 李钊
  • 作者简介:黄程程(1997-),男,四川仁寿人,硕士研究生,CCF会员,主要研究方向:计算机体系架构;董霄霄(1996-),女,山东济宁人,硕士研究生,主要研究方向:目标检测;李钊(1983-),男,山东淄博人,讲师,博士,主要研究方向:近似计算、机器学习。
  • 基金资助:
    山东省自然科学基金资助项目(ZR2018LF002);山东省高等学校青年创新团队发展计划项目(2019KJN048);淄博市校城融合发展计划项目(2018ZBXC021)。

Abstract: Aiming at problems such as high memory bandwidth demand, high computational complexity, long design and exploration cycle, and inter-layer computing delay of cascade convolution in two-dimensional Winograd convolution algorithm, a double-buffer 5×5 convolutional layer design method based on two-dimensional Winograd algorithm was proposed. Firstly, the column buffer structure was used to complete the data layout, so as to reuse the overlapping data between adjacent blocks and reduce the memory bandwidth demand. Then, the repeated intermediate calculation results in addition process of Winograd algorithm were precisely searched and reused to reduce the computational cost of addition, so that the energy consumption and the design area of the accelerator system were decreased. Finally, according to the calculation process of Winograd algorithm, the design of 6-stage pipeline structure was completed, and the efficient calculation for 5×5 convolution was realized. Experimental results show that, on the premise that the prediction accuracy of the Convolutional Neural Network (CNN) is basically not affected, this calculation method of 5×5 convolution reduces the multiplication computational cost by 83% compared to the traditional convolution, and has the acceleration ratio of 5.82; compared with the method of cascading 3×3 two-dimensional Winograd convolutions to generate 5×5 convolutions, the proposed method has the multiplication computational cost reduced by 12%, the memory bandwidth demand decreased by about 24.2%, and the computing time reduced by 20%.

Key words: Convolutional Neural Network (CNN), Field Programmable Gate Array (FPGA), Winograd algorithm, double-buffer, deep pipeline

摘要: 针对二维Winograd卷积算法中存储器带宽需求过高、计算复杂度高、设计探索周期漫长、级联的卷积存在层间计算延迟等问题,提出一种基于二维Winograd算法的双缓冲区5×5卷积层设计方法。首先使用列缓冲结构完成数据布局,以重用相邻分块之间的重叠数据,降低存储器带宽需求;然后精确搜索并复用Winograd算法加法计算过程中重复的中间计算结果,来降低加法运算量,从而减小加速器系统的能耗开销和设计面积;最后根据Winograd算法计算过程来完成6级流水线结构的设计,并实现针对5×5卷积的高效率计算。实验结果表明,这种5×5卷积的计算方法在基本不影响卷积神经网络(CNN)预测准确率的前提下,与传统卷积相比降低了83%的乘法运算量,加速倍率为5.82;该方法与级联3×3二维Winograd卷积组成5×5卷积的方法相比降低了12%的乘法运算量,降低了约24.2%的存储器带宽需求,并减少了20%的运算时间。

关键词: 卷积神经网络, 现场可编程逻辑门阵列, Winograd算法, 双缓冲区, 深流水线

CLC Number: