Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 526-535.DOI: 10.11772/j.issn.1001-9081.2023020213

• Computer software technology • Previous Articles    

Design and implementation of component-based development framework for deep learning applications

Xiang LIU, Bei HUA(), Fei LIN, Hongyuan WEI   

  1. College of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui 230027,China
  • Received:2023-03-03 Revised:2023-04-19 Accepted:2023-05-05 Online:2023-08-14 Published:2024-02-10
  • Contact: Bei HUA
  • About author:LIU Xiang, born in 1999, M. S. candidate. His research interests include high-performance computing system.
    LIN Fei, born in 1994, M. S. candidate. His research interests include high-performance computing system.
    WEI Hongyuan, born in 1996, M. S. candidate. Her research interests include high-performance computing system.
  • Supported by:
    National Key Research & Development Program of China(2018AAA0101204)

面向深度学习应用的组件式开发框架的设计实现

刘祥, 华蓓(), 林飞, 魏宏原   

  1. 中国科学技术大学 计算机科学与技术学院,合肥 230027
  • 通讯作者: 华蓓
  • 作者简介:刘祥(1999—),男,安徽宿州人,硕士研究生,主要研究方向:高性能计算系统
    林飞(1994—),男,安徽合肥人,硕士研究生,主要研究方向:高性能计算系统
    魏宏原(1996—),女,河南新乡人,硕士研究生,主要研究方向:高性能计算系统。
  • 基金资助:
    国家重点研发计划项目(2018AAA0101204)

Abstract:

Concerning the current lack of effective development and deployment tools for deep learning applications, a component-based development framework for deep learning applications was proposed. The framework splits functions according to the type of resource consumption, uses a review-guided resource allocation scheme for bottleneck elimination, and uses a step-by-step boxing scheme for function placement that takes into account high CPU utilization and low memory overhead. The real-time license plate number detection application developed based on this framework achieved 82% GPU utilization in throughput-first mode, 0.73 s average application latency in latency-first mode, and 68.8% average CPU utilization in three modes (throughput-first mode, latency-first mode, and balanced throughput/latency mode). The experimental results show that based on this framework, a balanced configuration of hardware throughput and application latency can be performed to efficiently utilize the computing resources of the platform in the throughput-first mode and meet the low latency requirements of the applications in the latency-first mode. Compared with MediaPipe, the use of this framework enabled ultra-real-time multi-person pose estimation application development, and the detection frame rate of the application was improved by up to 1 077%. The experimental results show that the framework is an effective solution for deep learning application development and deployment on CPU-GPU heterogeneous servers.

Key words: deep learning application, development framework, Component-Based Development (CBD), pipeline deployment, CPU-GPU heterogeneity

摘要:

针对目前深度学习应用缺少有效的开发与部署工具的问题,提出一个面向深度学习应用的组件式开发框架。所提框架根据应用的资源消耗类型进行功能拆分,使用评测引导的资源分配方案进行瓶颈消除,使用分步装箱方案兼顾高CPU利用率和低显存开销的功能放置。基于此框架开发的实时车牌号检测应用,在吞吐优先模式下GPU利用率达到82%,在延迟优先模式下平均应用延迟达到0.73 s,在三种模式下(吞吐优先模式、延迟优先模式以及吞吐/延迟的均衡模式)下,CPU平均利用率达到68.8%。实验结果表明,基于此框架能够进行硬件吞吐与应用延迟的平衡型配置,在吞吐优先模式下高效利用平台的计算资源,在延迟优先模式下满足应用的低延迟需求。相较于MediaPipe,使用本框架能够进行超实时的多人姿态估计应用开发,应用的检测帧率最高提升了1 077%。实验结果表明,所提框架能够作为CPU-GPU异构服务器上面向深度学习应用开发部署的有效解决方案。

关键词: 深度学习应用, 开发框架, 基于组件的开发, 流水线部署, CPU-GPU异构

CLC Number: