[1] LECUN Y,BENGIO Y,HINTON G. Deep learning[J]. Nature, 2015,521(7553):436-444. [2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2012:1097-1105. [3] RUSSAKOVSKY O,DENG J,SU H,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision,2015,115(3):211-252. [4] SIMONYAN K,ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-03-06]. https://arxiv.org/pdf/1409.1556.pdf. [5] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [6] LIU W,ANGUELOV D,ERHAN D,et al. SSD:single shot multibox detector[C]//Proceedings of the 2016 European Conference on Computer Vision,LNCS 9905. Cham:Springer, 2016:21-37. [7] 吴艳霞, 梁楷, 刘颖, 等. 深度学习FPGA加速器的进展与趋势[J]. 计算机学报,2019,42(11):2461-2480.(WU Y X,LIANG K, LIU Y, et al. The progress and trends of FPGA-based accelerators in deep learning[J]. Chinese Journal of Computers, 2019,42(11):2461-2480.) [8] CHEN T,DU Z,SUN N,et al. DianNao:a small-footprint highthroughput accelerator for ubiquitous machine-learning[C]//Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM,2014:269-284. [9] JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture. New York:ACM,2017:1-12. [10] CHEN Y H, EMER J, SZE V, et al. Eyeriss:a spatial architecture for energy-efficient dataflow for convolutional neural networks[C]//Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Piscataway:IEEE,2016:367-379. [11] QIU J,WANG J,YAO S,et al. Going deeper with embedded FPGA platform for convolutional neural network[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. New York:ACM,2016:26-35. [12] CHEN X,HAN Y,WANG Y. Communication lower bound in convolution accelerators[C]//Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture. Piscataway:IEEE,2020:529-541. [13] GUO K,SUI L,QIU J,et al. Angel-Eye:a complete design flow for mapping CNN onto embedded FPGA[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018, 37(1):35-47. [14] BOSI B,BOIS G,SAVARIA Y. Reconfigurable pipelined 2-D convolvers for fast digital signal processing[J]. IEEE Transactions on Very Large Scale Integration(VLSI)Systems,1999,7(3):299-308. [15] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2015:1-9. [16] HOWARD A G,ZHU M,CHEN B,et al. MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2020-10-10]. https://arxiv.org/pdf/1704.04861.pdf. [17] GUO K,ZENG S,YU J,et al. A survey of FPGA-based neural network inference accelerators[J]. ACM Transactions on Reconfigurable Technology and Systems,2019,12(1):No. 2. [18] HAN S,POOL J,TRAN J,et al. Learning both weights and connections for efficient neural networks[C]//Proceedings of the 2015 28th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2015:1135-1143. [19] REDMON J,FARHADI A. YOLO9000:better,faster,stronger[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition。Piscataway:IEEE,2017:6517-6525. [20] COURBARIAUX M,DAVID J P,BENGIO Y. Training deep neural networks with low precision multiplications[EB/OL].[2020-10-10]. https://arxiv.org/pdf/1412.7024.pdf. [21] HAN S,LIU X,MAO H,et al. EIE:efficient inference engine on compressed deep neural network[C]//Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture. Piscataway:IEEE,2016:243-254. [22] LIU Z,DOU Y,JIANG J,et al. Automatic code generation of convolutional neural networks in FPGA implementation[C]//Proceedings of the 2016 International Conference on FieldProgrammable Technology. Piscataway:IEEE,2016:61-68. [23] LI H,FAN X,JIAO L,et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks[C]//Proceedings of the 26th International Conference on Field Programmable Logic and Applications. Piscataway:IEEE,2016:1-9. [24] SHEN J, HUANG Y, WANG Z, et al. Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York:ACM,2018:97-106. |