《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2828-2835.DOI: 10.11772/j.issn.1001-9081.2022081177

• 先进计算 • 上一篇    下一篇

异构平台下卷积神经网络推理模型自适应划分和调度方法

尚绍法1, 蒋林1(), 李远成1, 朱筠2   

  1. 1.西安科技大学 计算机科学与技术学院,西安 710600
    2.西安邮电大学 电子工程学院,西安 710121
  • 收稿日期:2022-08-10 修回日期:2022-12-01 接受日期:2022-12-08 发布日期:2023-01-18 出版日期:2023-09-10
  • 通讯作者: 蒋林
  • 作者简介:尚绍法(1998—),男,陕西渭南人,硕士研究生,主要研究方向:编译优化、深度学习
    李远成(1981—),男,河南开封人,讲师,博士,CCF会员,主要研究方向:计算机体系结构、并行计算、人工智能
    朱筠(1981—),女,陕西西安人,讲师,硕士,主要研究方向:集成电路设计及仿真。
  • 基金资助:
    国家自然科学基金资助项目(61834005);陕西省自然科学基金资助项目(2020JM-525);科技创新2030-“新一代人工智能”重大项目(2020AAA0104603);榆林市科技计划项目(CXY-2020-026)

Adaptive partitioning and scheduling method of convolutional neural network inference model on heterogeneous platforms

Shaofa SHANG1, Lin JIANG1(), Yuancheng LI1, Yun ZHU2   

  1. 1.College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an Shaanxi 710600,China
    2.School of Electronic Engineering,Xi’an University of Posts and Telecommunications,Xi’an Shaanxi 710121,China
  • Received:2022-08-10 Revised:2022-12-01 Accepted:2022-12-08 Online:2023-01-18 Published:2023-09-10
  • Contact: Lin JIANG
  • About author:SHANG Shaofa, born in 1998, M. S. candidate. His research interests include compiling optimization, deep learning.
    LI Yuancheng, born in 1981, Ph. D., lecturer. His research interests include computer architecture, parallel computing, artificial intelligence.
    ZHU Yun, born in 1981, M. S., lecturer. Her research interests include integrated circuit design and simulation.
  • Supported by:
    National Natural Science Foundation of China(61834005);Shaanxi Natural Science Foundation(2020JM-525);Scientific and Technological Innovation 2030 - Major Project of “New Generation Artificial Intelligence”(2020AAA0104603);Yulin Science and Technology Program(CXY-2020-026)

摘要:

针对卷积神经网络(CNN)在异构平台执行推理时存在硬件资源利用率低、延迟高等问题,提出一种CNN推理模型自适应划分和调度方法。首先,通过遍历计算图提取CNN的关键算子完成模型的自适应划分,增强调度策略灵活性;然后,基于性能实测与关键路径-贪婪搜索算法,在CPU-GPU异构平台上根据子模型运行特征选取最优运行负载,提高子模型推理速度;最后利用张量虚拟机(TVM)中跨设备调度机制,配置子模型的依赖关系与运行负载,实现模型推理的自适应调度,降低设备间通信延迟。实验结果表明,与TVM算子优化方法在GPU和CPU上的推理速度相比,所提方法在模型推理准确度无损前提下,推理速度提升了5.88%~19.05%和45.45%~311.46%。

关键词: 张量虚拟机, 卷积神经网络, 模型划分, 任务调度, 特征分析

Abstract:

Aiming at the problems of low hardware resource utilization and high latency of Convolutional Neural Network (CNN) when performing inference on heterogeneous platforms, a self-adaptive partitioning and scheduling method of CNN inference model was proposed. Firstly, the key operators of CNN were extracted by traversing the computational graph to complete the adaptive partition of the model, so as to enhance the flexibility of the scheduling strategy. Then, based on the performance measurement and the critical path-greedy search algorithm, according to the sub-model running characteristics on the CPU-GPU heterogeneous platform, the optimal running load was selected to improve the sub-model inference speed. Finally, the cross-device scheduling mechanism in TVM (Tensor Virtual Machine) was used to configure the dependencies and running loads of sub-models in order to achieve adaptive scheduling of model inference, and reduce the communication delay between devices. Experimental results show that on GPU and CPU, compared to the method optimized by TVM operator, the proposed method improves the inference speed by 5.88% to 19.05% and 45.45% to 311.46% with no loss of model inference accuracy.

Key words: Tensor Virtual Machine (TVM), Convolutional Neural Network (CNN), model partitioning, task scheduling, characteristic analysis

中图分类号: