Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (3): 669-676.DOI: 10.11772/j.issn.1001-9081.2020060994

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

YOLOv3 compression and acceleration based on ZYNQ platform

GUO Wenxu1, SU Yuanqi1, LIU Yuehu2   

  1. 1. Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China;
    2. College of Artificial Intelligence, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China
  • Received:2020-07-09 Revised:2020-11-12 Online:2021-03-10 Published:2020-12-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61973245).


郭文旭1, 苏远歧1, 刘跃虎2   

  1. 1. 西安交通大学 电子与信息学部, 西安 710049;
    2. 西安交通大学 人工智能学院, 西安 710049
  • 通讯作者: 苏远歧
  • 作者简介:郭文旭(1996-),男,陕西杨凌人,硕士研究生,主要研究方向:计算机视觉、深度学习;苏远歧(1982-),男,江苏如皋人,讲师,博士,CCF会员,主要研究方向:计算机视觉、模式识别;刘跃虎(1962-),男,山西河曲人,教授,博士,主要研究方向:计算机视觉、模式识别。
  • 基金资助:

Abstract: The object detection networks with high accuracy are hard to be directly deployed on end-devices such as vehicles and drones due to their significant increase of parameters and computational cost. In order to solve the problem, by considering network compression and computation acceleration, a new compression scheme for residual networks was proposed to compress YOLOv3 (You Only Look Once v3), and this compressed network was then accelerated on ZYNQ platform. Firstly, a network compression algorithm containing both network pruning and network quantization was proposed. In the aspect of network pruning, a strategy for residual structure was introduced to divide the network pruning into two granularities:channel pruning and residual connection pruning, which overcame the limitations of the channel pruning on residual connections and further reduced the parameter number of the model. In the aspect of network quantization, a relative entropy-based simulated quantization was utilized to quantize the parameters channel by channel, and perform the online statistics of the parameter distribution and the information loss caused by the parameter quantization, so as to assist to choose the best quantization strategy to reduce the precision loss during the quantization process. Secondly, the 8-bit convolution acceleration module was designed and optimized on ZYNQ platform, which optimized the on-chip cache structure and accelerate the compressed YOLOv3 with combining the Winograd algorithm. Experimental results show that the proposed solution can achieve smaller model scale and higher accuracy (7 percent points increased) compared to YOLOv3 tiny. Meanwhile, the hardware acceleration method on ZYNQ platform achieves higher energy efficiency ratio than other platforms, thus helping the actual deployment of YOLOv3 and other residual networks on the end sides of ZYNQ.

Key words: object detection, neural network compression, computation acceleration, network pruning, network quantization, ZYNQ platform

摘要: 高精度物体检测网络急剧增加的参数和计算量使得它们很难在车辆和无人机等端侧设备上直接部署使用。针对这一问题,从网络压缩和计算加速两方面入手,提出了一种面向残差网络的新型压缩方案来实现YOLOv3的压缩,并通过ZYNQ平台对这一压缩后的网络进行加速。首先,提出了包括网络裁剪和网络量化两方面的网络压缩算法。网络裁剪方面,给出了针对残差结构的裁剪策略来将网络剪枝分为通道剪枝和残差链剪枝两个粒度,解决了通道剪枝无法应对残差连接的局限性,进一步降低了模型的参数量;网络量化方面,实现了一种基于相对熵的模拟量化方法,以通道为单位对参数进行量化,在线统计模型的参数分布与参数量化造成的信息损失,从而辅助选择最优量化策略来减少量化过程的精度损失。然后,在ZYNQ平台上设计并改进了8比特的卷积加速模块,从而优化了片上缓存结构并结合Winograd算法实现了压缩后YOLOv3的加速。实验结果表明,所提压缩算法较YOLOv3 tiny能够进一步降低模型尺寸,但检测精度提升了7个百分点;同时ZYNQ平台上的硬件加速方法获得了比其他平台更高的能耗比,从而推进了YOLOv3以及其他残差网络在ZYNQ端侧的实际部署。

关键词: 物体检测, 神经网络压缩, 计算加速, 网络剪枝, 网络量化, ZYNQ平台

CLC Number: