Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (9): 2828-2835.DOI: 10.11772/j.issn.1001-9081.2022081177
• Advanced computing • Previous Articles Next Articles
Shaofa SHANG1, Lin JIANG1(), Yuancheng LI1, Yun ZHU2
Received:
2022-08-10
Revised:
2022-12-01
Accepted:
2022-12-08
Online:
2023-01-18
Published:
2023-09-10
Contact:
Lin JIANG
About author:
SHANG Shaofa, born in 1998, M. S. candidate. His research interests include compiling optimization, deep learning.Supported by:
通讯作者:
蒋林
作者简介:
尚绍法(1998—),男,陕西渭南人,硕士研究生,主要研究方向:编译优化、深度学习基金资助:
CLC Number:
Shaofa SHANG, Lin JIANG, Yuancheng LI, Yun ZHU. Adaptive partitioning and scheduling method of convolutional neural network inference model on heterogeneous platforms[J]. Journal of Computer Applications, 2023, 43(9): 2828-2835.
尚绍法, 蒋林, 李远成, 朱筠. 异构平台下卷积神经网络推理模型自适应划分和调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2828-2835.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022081177
模型 | 主要组成 |
---|---|
VGG16 | 5×Conv_Block+3×Fc |
ResNet18 | 1×Conv_Block+4×Residual_Block(含19×Conv_Block)+1×Fc |
GoogLeNet | 3×Conv_Block+9×Inception_Block(含64×Conv_Block)+1×Fc |
Tab.1 Composition structures of three CNN models
模型 | 主要组成 |
---|---|
VGG16 | 5×Conv_Block+3×Fc |
ResNet18 | 1×Conv_Block+4×Residual_Block(含19×Conv_Block)+1×Fc |
GoogLeNet | 3×Conv_Block+9×Inception_Block(含64×Conv_Block)+1×Fc |
模块 | CPU | GPU |
---|---|---|
C_1 | 1.08 | 0.35 |
C_2 | 7.62 | 0.75 |
C_3 | 5.46 | 1.56 |
C_4 | 2.19 | 1.85 |
C_5 | 0.84 | 0.96 |
FC | 0.46 | 3.75 |
Tab. 2 Comparison of execution time of VGG16 submodels on different devices
模块 | CPU | GPU |
---|---|---|
C_1 | 1.08 | 0.35 |
C_2 | 7.62 | 0.75 |
C_3 | 5.46 | 1.56 |
C_4 | 2.19 | 1.85 |
C_5 | 0.84 | 0.96 |
FC | 0.46 | 3.75 |
标签 | 描述 | |
---|---|---|
CPU | 型号 | 2×Intel Xeon Gold 6248R |
物理核心数 | 48 | |
线程数 | 96 | |
L3缓存 | 35.75 MB | |
内存容量 | 256 GB DDR4 | |
内存频率 | 3 200 MHz | |
GPU | 型号 | 1*Nvidia Quadro P2200 |
核心数 | 1280 CUDA 并行运算处理核心 | |
显存容量 | 5 GB DDR5x | |
单精度运算性能 | 最高3.8 TFLOPs | |
系统环境 | Ubuntu18.04 | |
系统内核 | 5.40-96-generic | |
深度学习框架 | PyTorch3.6 | |
深度学习编译器 | TVM9.0 |
Tab. 3 Experimental platform configuration information
标签 | 描述 | |
---|---|---|
CPU | 型号 | 2×Intel Xeon Gold 6248R |
物理核心数 | 48 | |
线程数 | 96 | |
L3缓存 | 35.75 MB | |
内存容量 | 256 GB DDR4 | |
内存频率 | 3 200 MHz | |
GPU | 型号 | 1*Nvidia Quadro P2200 |
核心数 | 1280 CUDA 并行运算处理核心 | |
显存容量 | 5 GB DDR5x | |
单精度运算性能 | 最高3.8 TFLOPs | |
系统环境 | Ubuntu18.04 | |
系统内核 | 5.40-96-generic | |
深度学习框架 | PyTorch3.6 | |
深度学习编译器 | TVM9.0 |
模型 | 卷积层 | 分类层 | ||
---|---|---|---|---|
计算量/FLOPs | 参数量/MB | 计算量/FLOPs | 参数量/MB | |
VGG11 | 7.506 | 9.220 | 0.124 | 123.643 |
AlexNet | 0.657 | 2.470 | 0.059 | 58.631 |
MobileNet | 0.319 | 2.224 | 0.001 | 1.281 |
SqueezeNet | 0.269 | 0.722 | 0.513 | 0.087 |
ResNet18 | 1.821 | 11.177 | 0.001 | 0.513 |
GoogLeNet | 1.507 | 5.600 | 0.001 | 1.025 |
Tab. 4 Computational amount and parameters of convolutional and classification layers of different models
模型 | 卷积层 | 分类层 | ||
---|---|---|---|---|
计算量/FLOPs | 参数量/MB | 计算量/FLOPs | 参数量/MB | |
VGG11 | 7.506 | 9.220 | 0.124 | 123.643 |
AlexNet | 0.657 | 2.470 | 0.059 | 58.631 |
MobileNet | 0.319 | 2.224 | 0.001 | 1.281 |
SqueezeNet | 0.269 | 0.722 | 0.513 | 0.087 |
ResNet18 | 1.821 | 11.177 | 0.001 | 0.513 |
GoogLeNet | 1.507 | 5.600 | 0.001 | 1.025 |
模型 | 层数 | 划分结果 | 设备分配结果 | |
---|---|---|---|---|
CPU | GPU | |||
AlexNet | 32 | 0-3,4-7,8-17,18-31 | 0-3,18-31 | 4-7,8-17 |
VGG11 | 43 | 0-3,4-7,8-14,15-21,22-28,29-42 | 0-3,29-42 | 4-7,8-14,15-21,22-28 |
ResNet18 | 72 | 0-3,4-10,11-17,18-26,27-33,34-42, 43-50,51-59,60-66,67-71 | 67-71 | 0-3,4-10,11-17,18-26,27-33, 34-42,43-50,51-59,60-66 |
Tab. 5 Partitioning and device assignment of CNN models
模型 | 层数 | 划分结果 | 设备分配结果 | |
---|---|---|---|---|
CPU | GPU | |||
AlexNet | 32 | 0-3,4-7,8-17,18-31 | 0-3,18-31 | 4-7,8-17 |
VGG11 | 43 | 0-3,4-7,8-14,15-21,22-28,29-42 | 0-3,29-42 | 4-7,8-14,15-21,22-28 |
ResNet18 | 72 | 0-3,4-10,11-17,18-26,27-33,34-42, 43-50,51-59,60-66,67-71 | 67-71 | 0-3,4-10,11-17,18-26,27-33, 34-42,43-50,51-59,60-66 |
1 | ZENG Q S, DU Y Q, HUANG K B, et al. Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing[J]. IEEE Transactions on Wireless Communications, 2021, 20(12): 7947-7962. 10.1109/twc.2021.3088910 |
2 | 郭棉,张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645. 10.11772/j.issn.1001-9081.2020111734 |
GUO M, ZHANG J Y. Computation offloading policy for machine learning in mobile edge computing environments[J]. Journal of Computer Applications, 2021, 41(9): 2639-2645. 10.11772/j.issn.1001-9081.2020111734 | |
3 | LI M Z, LIU Y, LIU X Y, et al. The deep learning compiler: a comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708-727. 10.1109/tpds.2020.3030548 |
4 | 宋冰冰,张浩,吴子锋,等. 自动化张量分解加速卷积神经网络[J]. 软件学报, 2021, 32(11):3468-3481. |
SONG B B, ZHANG H, WU Z F, et al. Automated tensor decomposition to accelerate convolutional neural networks[J]. Journal of Software, 2021, 32(11): 3468-3481. | |
5 | HOU X Y, GUAN Y J, HAN T, et al. DistrEdge: speeding up convolutional neural network inference on distributed edge devices[C]// Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2022: 1097-1107. 10.1109/ipdps53621.2022.00110 |
6 | TANAKA M, TAURA K, HANAWA T, et al. Automatic graph partitioning for very large-scale deep learning[C]// Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2021: 1004-1013. 10.1109/ipdps49936.2021.00109 |
7 | XU Y J, WU H, ZHANG W B, et al. EOP: efficient operator partition for deep learning inference over edge servers[C]// Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. New York: ACM, 2022: 45-57. 10.1145/3516807.3516820 |
8 | 邝祝芳,陈清林,李林峰,等. 基于深度强化学习的多用户边缘计算任务卸载调度与资源分配算法[J]. 计算机学报, 2022, 45(4):812-824. 10.11897/SP.J.1016.2022.00812 |
KUANG Z F, CHEN Q L, LI L F, et al. Multi-user edge computing task offloading scheduling and resource allocation based on deep reinforcement learning[J]. Chinese Journal of Computers, 2022, 45(4): 812-824. 10.11897/SP.J.1016.2022.00812 | |
9 | PARK J H, YUN G, YI C M, et al. HetPipe: enabling large DNN training on (whimpy) heterogeneous GPU clusters through integration of pipelined model parallelism and data parallelism[C]// Proceedings of the 2020 USENIX Annual Technical Conference. Berkeley: USENIX Association, 2020: 307-321. 10.48550/arXiv.2005.14038 |
10 | WANG S Q, ANANTHANARAYANAN G, ZENG Y F, et al. High-throughput CNN inference on embedded ARM big.LITTLE multicore processors[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(10): 2254-2267. 10.1109/tcad.2019.2944584 |
11 | BEAUMONT O, EYRAUD-DUBOIS L, SHILOVA A. MadPipe: memory aware dynamic programming algorithm for pipelined model parallelism[C]// Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium Workshops. Piscataway: IEEE, 2022: 1063-1073. 10.1109/ipdpsw55747.2022.00174 |
12 | LU J M, FANG C, XU M Y, et al. Evaluations on deep neural networks training using posit number system[J]. IEEE Transactions on Computers, 2021, 70(2): 174-187. 10.1109/tc.2020.2985971 |
13 | CHEN T Q, MOREAU T, JIANG Z H, et al. TVM: an automated end-to-end optimizing compiler for deep learning[C]// Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. Berkeley: USENIX Association, 2018: 579-594. |
14 | 赵旭,黄光球,江晋,等. 基于深度强化学习的资源受限条件下的DIDS任务调度优化方法[J]. 控制与决策, 2022, 37(11): 3052-3057. |
ZHAO X, HUANG G Q, JIANG J, et al. An optimization method for DIDS task scheduling under resource-constrained conditions based on deep reinforcement learning[J]. Control and Decision, 2022, 37(11): 3052-3057. | |
15 | TARNAWSKI J M, PHANISHAYEE A, DEVANUR N, et al. Efficient algorithms for device placement of DNN graph operators[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2020: 15451-15463. |
16 | NARAYANAN D, PHANISHAYEE A, SHI K Y, et al. Memory-efficient pipeline-parallel DNN training[C]// Proceedings of the 38th International Conference on Machine Learning. New York: JMLR.org, 2021: 7937-7947. |
17 | SHEN H C, ROESCH J, CHEN Z, et al. Nimble: efficiently compiling dynamic neural networks for model inference[C/OL]// Proceedings of the 4th Conference on Machine Learning and Systems [2022-11-12].. |
18 | 刘瑞奇,李博扬,高玉金,等. 新型分布式计算系统中的异构任务调度框架[J]. 软件学报, 2022, 33(3):1005-1017. |
LIU R Q, LI B Y, GAO Y J, et al. Heterogeneous task scheduling framework in emerging distributed computing systems[J]. Journal of Software, 2022, 33(3): 1005-1017. | |
19 | YIN J, HAN J, ZHANG X D. An optimization toolchain design of deep learning deployment based on heterogeneous computing platform[C]// Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing. Piscataway: IEEE, 2020: 631-635. 10.1109/wcsp49889.2020.9299844 |
20 | YU F X, BRAY S, WANG D, et al. Automated runtime-aware scheduling for multi-tenant DNN inference on GPU[C]// Proceedings of the 2021 IEEE/ACM International Conference on Computer Aided Design. Piscataway: IEEE, 2021: 1-9. 10.1109/iccad51958.2021.9643501 |
21 | HU C, LI B. Distributed inference with deep learning models across heterogeneous edge devices[C]// Proceedings of the 2022 IEEE Conference on Computer Communications. Piscataway: IEEE, 2022: 330-339. 10.1109/infocom48880.2022.9796896 |
22 | HEMMAT M, DAVOODI A, HU Y H. Edge n AI: distributed inference with local edge devices and minimal latency[C]// Proceedings of the 27th Asia and South Pacific Design Automation Conference. Piscataway: IEEE, 2022: 544-549. 10.1109/asp-dac52403.2022.9712496 |
23 | LI Q, HUANG L, TONG Z, et al. DISSEC: a distributed deep neural network inference scheduling strategy for edge clusters[J]. Neurocomputing, 2022, 500: 449-460. 10.1016/j.neucom.2022.05.084 |
24 | JIA Z H, LIN S N, QI C R, et al. Exploring hidden dimensions in parallelizing convolutional neural networks[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 2274-2283. |
[1] | Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration [J]. Journal of Computer Applications, 2024, 44(9): 2802-2809. |
[2] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[3] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[4] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[5] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[6] | Mengyuan HUANG, Kan CHANG, Mingyang LING, Xinjie WEI, Tuanfa QIN. Progressive enhancement algorithm for low-light images based on layer guidance [J]. Journal of Computer Applications, 2024, 44(6): 1911-1919. |
[7] | Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759. |
[8] | Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478. |
[9] | Min SUN, Qian CHENG, Xining DING. CBAM-CGRU-SVM based malware detection method for Android [J]. Journal of Computer Applications, 2024, 44(5): 1539-1545. |
[10] | Tianhua CHEN, Jiaxuan ZHU, Jie YIN. Bird recognition algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(4): 1114-1120. |
[11] | Lijun XU, Hui LI, Zuyang LIU, Kansong CHEN, Weixuan MA. 3D-GA-Unet: MRI image segmentation algorithm for glioma based on 3D-Ghost CNN [J]. Journal of Computer Applications, 2024, 44(4): 1294-1302. |
[12] | Jie WANG, Hua MENG. Image classification algorithm based on overall topological structure of point cloud [J]. Journal of Computer Applications, 2024, 44(4): 1107-1113. |
[13] | Jingxian ZHOU, Xina LI. UAV detection and recognition based on improved convolutional neural network and radio frequency fingerprint [J]. Journal of Computer Applications, 2024, 44(3): 876-882. |
[14] | Ruifeng HOU, Pengcheng ZHANG, Liyuan ZHANG, Zhiguo GUI, Yi LIU, Haowen ZHANG, Shubin WANG. Iterative denoising network based on total variation regular term expansion [J]. Journal of Computer Applications, 2024, 44(3): 916-921. |
[15] | Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||