Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

doi:10.11772/j.issn.1001-9081.2020101668

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (8): 2258-2264.DOI: 10.11772/j.issn.1001-9081.2020101668

Special Issue: 先进计算

• Advanced computing • Previous Articles Next Articles

Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

HUANG Chengcheng, DONG Xiaoxiao, LI Zhao

School of Computer Science and Technology, Shandong University of Technology, Zibo Shandong 255049, China

Received:2020-10-28 Revised:2021-01-27 Online:2021-02-05 Published:2021-08-10
Supported by:
This work is partially supported by the Natural Science Foundation of Shandong Province (ZR2018LF002), the Development Program of Youth Innovation Teams in Colleges and Universities of Shandong Province (2019KJN048), the University and City Integration Development Program of Zibo City (2018ZBXC021).

基于二维Winograd算法的深流水线5×5卷积方法

黄程程, 董霄霄, 李钊

山东理工大学计算机科学与技术学院, 山东淄博 255049

通讯作者: 李钊
作者简介:黄程程(1997-),男,四川仁寿人,硕士研究生,CCF会员,主要研究方向:计算机体系架构;董霄霄(1996-),女,山东济宁人,硕士研究生,主要研究方向:目标检测;李钊(1983-),男,山东淄博人,讲师,博士,主要研究方向:近似计算、机器学习。
基金资助:
山东省自然科学基金资助项目（ZR2018LF002）；山东省高等学校青年创新团队发展计划项目（2019KJN048）；淄博市校城融合发展计划项目（2018ZBXC021）。

Abstract

Abstract: Aiming at problems such as high memory bandwidth demand, high computational complexity, long design and exploration cycle, and inter-layer computing delay of cascade convolution in two-dimensional Winograd convolution algorithm, a double-buffer 5×5 convolutional layer design method based on two-dimensional Winograd algorithm was proposed. Firstly, the column buffer structure was used to complete the data layout, so as to reuse the overlapping data between adjacent blocks and reduce the memory bandwidth demand. Then, the repeated intermediate calculation results in addition process of Winograd algorithm were precisely searched and reused to reduce the computational cost of addition, so that the energy consumption and the design area of the accelerator system were decreased. Finally, according to the calculation process of Winograd algorithm, the design of 6-stage pipeline structure was completed, and the efficient calculation for 5×5 convolution was realized. Experimental results show that, on the premise that the prediction accuracy of the Convolutional Neural Network (CNN) is basically not affected, this calculation method of 5×5 convolution reduces the multiplication computational cost by 83% compared to the traditional convolution, and has the acceleration ratio of 5.82; compared with the method of cascading 3×3 two-dimensional Winograd convolutions to generate 5×5 convolutions, the proposed method has the multiplication computational cost reduced by 12%, the memory bandwidth demand decreased by about 24.2%, and the computing time reduced by 20%.

Key words: Convolutional Neural Network (CNN), Field Programmable Gate Array (FPGA), Winograd algorithm, double-buffer, deep pipeline

摘要： 针对二维Winograd卷积算法中存储器带宽需求过高、计算复杂度高、设计探索周期漫长、级联的卷积存在层间计算延迟等问题，提出一种基于二维Winograd算法的双缓冲区5×5卷积层设计方法。首先使用列缓冲结构完成数据布局，以重用相邻分块之间的重叠数据，降低存储器带宽需求；然后精确搜索并复用Winograd算法加法计算过程中重复的中间计算结果，来降低加法运算量，从而减小加速器系统的能耗开销和设计面积；最后根据Winograd算法计算过程来完成6级流水线结构的设计，并实现针对5×5卷积的高效率计算。实验结果表明，这种5×5卷积的计算方法在基本不影响卷积神经网络（CNN）预测准确率的前提下，与传统卷积相比降低了83%的乘法运算量，加速倍率为5.82；该方法与级联3×3二维Winograd卷积组成5×5卷积的方法相比降低了12%的乘法运算量，降低了约24.2%的存储器带宽需求，并减少了20%的运算时间。

关键词: 卷积神经网络, 现场可编程逻辑门阵列, Winograd算法, 双缓冲区, 深流水线

CLC Number:

TP302.1

HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm[J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.

黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.

References

[1] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1-9.
[2] PHAM N Q,NGUYEN T S,NIEHUES J,et al. Very deep selfattention networks for end-to-end speech recognition[EB/OL].[2020-09-26]. https://www.isca-speech.org/archive/Interspeech_2019/pdfs/2702.pdf.
[3] LI X H,LAI T T,WANG S Y,et al. Weighted feature pyramid networks for object detection[C]//Proceedings of the 2019 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking. Piscataway:IEEE,2019:1500-1504.
[4] 冯文博, 洪征, 吴礼发, 等. 基于卷积神经网络的应用层协议识别方法[J]. 计算机应用,2019,39(12):3615-3621.(FENG W B,HONG Z,WU L F,et al. Application protocol recognition method based on convolutional neural network[J]. Journal of Computer Applications,2019,39(12):3615-3621.)
[5] 刘尚旺, 刘承伟, 张爱丽. 基于深度可分卷积神经网络的实时人脸表情和性别分类[J]. 计算机应用,2020,40(4):990-995. (LIU S W,LIU C W,ZHANG A L. Real-time facial expression and gender recognition based on depthwise separable convolutional neural network[J]. Journal of Computer Applications,2020,40(4):990-995.)
[6] 刘伟波, 曾庆宁, 卜玉婷, 等. 基于双微阵列与卷积神经网络的语音识别方法[J]. 计算机应用,2019,39(11):3268-3273.(LIU W B,ZENG Q N,BU Y T,et al. Speech recognition method based on dual micro-array and convolutional neural network[J]. Journal of Computer Applications,2019,39(11):3268-3273.)
[7] YIN Q, LI Y F, HUANG H Z, et al. FPGA-based highperformance CNN accelerator architecture with high DSP utilization and efficient scheduling mode[C]//Proceedings of the 2020 International Conference on High Performance Big Data and Intelligent Systems. Piscataway:IEEE,2020:1-7.
[8] SHEN J Z,HUANG Y,WANG Z L,et al. Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM, 2018:97-106.
[9] RAJAT R, ZENG H Q, PRASANNA V. A flexible design automation tool for accelerating quantized spectral CNNs[C]//Proceedings of 29th International Conference on FieldProgrammable Logic and Applications. Piscataway:IEEE,2019:144-150.
[10] LIANG Y,LU L Q,XIAO Q C,et al. Evaluating fast algorithms for convolutional neural networks on FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2020,39(4):857-870.
[11] PODILI A, ZHANG C, PRASANNA V. Fast and efficient implementation of convolutional neural networks on FPGA[C]//Proceedings of the IEEE 28th International Conference on Application-Specific Systems, Architectures and Processors. Piscataway:IEEE,2017:11-18.
[12] SHEN J Z,HUANG Y,WEN M,et al. Towards an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA[J]. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems,2020,39(7):1442-1455.
[13] ZHANG C,LI P,SUN G Y,et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM, 2015:161-170.
[14] ZHU C Y,HUANG K J,YANG S Y,et al. An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs[J]. IEEE Transactions on Very Large Scale Integration (VLSI)Systems,2020,28(9):1953-1965.
[15] ZHANG C,PRASANNA V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system[C]//Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM,2017:35-44.
[16] GUO K Y,SUI L Z,QIU J T,et al. Angel-Eye:a complete design flow for mapping CNN onto embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,37(1):35-47.
[17] WINOGRAD S. Arithmetic Complexity of Computations[M]. Philadelphia, PA:Society for Industrial and Applied Mathematics,1980:18-23.

Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm

基于二维Winograd算法的深流水线5×5卷积方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499.
[3]	Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994.
[4]	Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242.
[5]	Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919.
[6]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[7]	Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545.
[8]	Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478.
[9]	Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113.
[10]	Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
[11]	Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet： MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302.
[12]	Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708.
[13]	Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921.
[14]	Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882.
[15]	Jiawei ZHANG, Guandong GAO, Ke XIAO, Shengzun SONG. Violent crime hierarchy algorithm by joint modeling of improved hierarchical attention network and TextCNN [J]. Journal of Computer Applications, 2024, 44(2): 403-410.