• 人工智能 •

### 基于ZYNQ平台的YOLOv3压缩和加速

1. 1. 西安交通大学 电子与信息学部, 西安 710049;
2. 西安交通大学 人工智能学院, 西安 710049
• 收稿日期:2020-07-09 修回日期:2020-11-12 出版日期:2021-03-10 发布日期:2020-12-08
• 通讯作者: 苏远歧
• 作者简介:郭文旭(1996-),男,陕西杨凌人,硕士研究生,主要研究方向:计算机视觉、深度学习;苏远歧(1982-),男,江苏如皋人,讲师,博士,CCF会员,主要研究方向:计算机视觉、模式识别;刘跃虎(1962-),男,山西河曲人,教授,博士,主要研究方向:计算机视觉、模式识别。
• 基金资助:
国家自然科学基金资助项目（61973245）。

### YOLOv3 compression and acceleration based on ZYNQ platform

1. 1. Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China;
2. College of Artificial Intelligence, Xi'an Jiaotong University, Xi'an Shaanxi 710049, China
• Received:2020-07-09 Revised:2020-11-12 Online:2021-03-10 Published:2020-12-08
• Supported by:
This work is partially supported by the National Natural Science Foundation of China (61973245).

Abstract: The object detection networks with high accuracy are hard to be directly deployed on end-devices such as vehicles and drones due to their significant increase of parameters and computational cost. In order to solve the problem, by considering network compression and computation acceleration, a new compression scheme for residual networks was proposed to compress YOLOv3 (You Only Look Once v3), and this compressed network was then accelerated on ZYNQ platform. Firstly, a network compression algorithm containing both network pruning and network quantization was proposed. In the aspect of network pruning, a strategy for residual structure was introduced to divide the network pruning into two granularities:channel pruning and residual connection pruning, which overcame the limitations of the channel pruning on residual connections and further reduced the parameter number of the model. In the aspect of network quantization, a relative entropy-based simulated quantization was utilized to quantize the parameters channel by channel, and perform the online statistics of the parameter distribution and the information loss caused by the parameter quantization, so as to assist to choose the best quantization strategy to reduce the precision loss during the quantization process. Secondly, the 8-bit convolution acceleration module was designed and optimized on ZYNQ platform, which optimized the on-chip cache structure and accelerate the compressed YOLOv3 with combining the Winograd algorithm. Experimental results show that the proposed solution can achieve smaller model scale and higher accuracy (7 percent points increased) compared to YOLOv3 tiny. Meanwhile, the hardware acceleration method on ZYNQ platform achieves higher energy efficiency ratio than other platforms, thus helping the actual deployment of YOLOv3 and other residual networks on the end sides of ZYNQ.