《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2836-2844.DOI: 10.11772/j.issn.1001-9081.2022081259

• 先进计算 • 上一篇    下一篇

基于张量虚拟机的深度神经网络模型加速方法

申云飞1, 申飞2,3, 李芳2,3, 张俊2,3()   

  1. 1.安徽大学 物质科学与信息技术研究院, 合肥 230031
    2.中国科学院合肥物质科学研究院 强磁场科学中心, 合肥 230031
    3.强磁场安徽省实验室, 合肥 230031
  • 收稿日期:2022-08-25 修回日期:2022-11-02 接受日期:2022-11-08 发布日期:2023-01-11 出版日期:2023-09-10
  • 通讯作者: 张俊
  • 作者简介:申云飞(1996-),女,安徽阜阳人,硕士研究生,主要研究方向:计算机视觉、深度学习编译器
    申飞(1971-),女,安徽宿州人,高级工程师,硕士,主要研究方向:仿生感知、网络化传感器、人机交互
    李芳(1985-),女,安徽合肥人,高级工程师,博士,主要研究方向:测量与控制、数据治理技术;
  • 基金资助:
    安徽省重点研究与开发计划项目(202004h07020031);中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003)

Deep neural network model acceleration method based on tensor virtual machine

Yunfei SHEN1, Fei SHEN2,3, Fang LI2,3, Jun ZHANG2,3()   

  1. 1.Institute of Physical Science and Information Technology,Anhui University,Hefei Anhui 230031,China
    2.High Magnetic Field Laboratory,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei Anhui 230031,China
    3.High Magnetic Field Laboratory of Anhui Province,Hefei Anhui 230031,China
  • Received:2022-08-25 Revised:2022-11-02 Accepted:2022-11-08 Online:2023-01-11 Published:2023-09-10
  • Contact: Jun ZHANG
  • About author:SHEN Yunfei, born in 1996, M. S. candidate. Her research interests include computer vision, deep learning compiler.
    SHEN Fei, born in 1971, M. S., senior engineer. Her research interests include bionic sensing, networked sensor, human-computer interaction.
    LI Fang, born in 1985, Ph. D., senior engineer. Her research interests include measurement and control, data governance technology.
  • Supported by:
    Key Research and Development Program of Anhui Province(202004h07020031);Key Program of Research and Development of Hefei Science Center, Chinese Academy of Sciences(2019HSC-KPRD003)

摘要:

随着人工智能(AI)技术的蓬勃发展,深度神经网络(DNN)模型被大规模应用到各类移动端与边缘端。然而,边缘端算力低、内存容量小,且实现模型加速需要深入掌握边缘端硬件知识,这增加了模型的部署难度,也限制了模型的推广应用。因此,基于张量虚拟机(TVM)提出一种DNN加速与部署方法,从而实现卷积神经网络(CNN)模型在现场可编程门阵列(FPGA)上的加速,并在分心驾驶分类应用场景下验证了所提方法的可行性。通过计算图优化方法减小了模型的访存和计算开销,通过模型量化方法减小了模型尺寸,通过计算图打包方法将卷积计算卸载到FPGA上执行以提高模型推理速度。与微处理器(MPU)相比,所提方法可使ResNet50和ResNet18在MPU+FPGA上的推理时间分别减少88.63%和77.53%;而在AUC(American University in Cairo)数据集上,相较于MPU,两个模型在MPU+FPGA上的top1推理精度仅下降了0.26和0.16个百分点。可见,所提方法可以降低不同模型在FPGA上的部署难度。

关键词: 张量虚拟机, 深度神经网络, 现场可编程门阵列, 边缘设备, 模型部署, 模型加速

Abstract:

With the development of Artificial Intelligence (AI) technology, the Deep Neural Network (DNN) models have been applied to various mobile and edge devices widely. However, the model deployment becomes challenging and the popularization and application of the models are limited due to the facts that the computing power of edge devices is low, the memory capacity of edge devices is small, and the realization of model acceleration requires in-depth knowledge of edge device hardware. Therefore, a DNN acceleration and deployment method based on Tensor Virtual Machine (TVM) was presented to accelerate the Convolutional Neural Network (CNN) model on Field-Programmable Gate Array FPGA), and the feasibility of this method was verified in the application scenarios of distracted driving classification. Specifically, in the proposed method, the computational graph optimization method was utilized to reduce the memory access and computational overhead of the model, the model quantization method was used to reduce the model size, and the computational graph packing method was adopted to offload the convolution calculation to the FPGA in order to speed up the model inference. Compared with MPU (MicroProcessor Unit), the proposed method can reduce the inference time of ResNet50 and ResNet18 on MPU+FPGA by 88.63% and 77.53% respectively. On AUC (American University in Cairo) dataset, compared to MPU, the top1 inference accuracies of the two models on MPU+FPGA are only reduced by 0.26 and 0.16 percentage points respectively. It can be seen that the proposed method can reduce the deployment difficulty of different models on FPGA.

Key words: Tensor Virtual Machine (TVM), Deep Neural Network (DNN), Field-Programmable Gate Array (FPGA), edge device, model deployment, model acceleration

中图分类号: