Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2802-2809.DOI: 10.11772/j.issn.1001-9081.2023091252
• Advanced computing • Previous Articles Next Articles
Na WANG1, Lin JIANG1(), Yuancheng LI1, Yun ZHU2
Received:
2023-09-18
Revised:
2023-11-14
Accepted:
2023-11-20
Online:
2024-03-15
Published:
2024-09-10
Contact:
Lin JIANG
About author:
WANG Na, born in 1994, M. S. candidate. Her research interests include reconfigurable compilation optimization, deep learningSupported by:
通讯作者:
蒋林
作者简介:
王娜(1994—),女,陕西渭南人,硕士研究生,主要研究方向:可重构编译优化、深度学习基金资助:
CLC Number:
Na WANG, Lin JIANG, Yuancheng LI, Yun ZHU. Optimization of tensor virtual machine operator fusion based on graph rewriting and fusion exploration[J]. Journal of Computer Applications, 2024, 44(9): 2802-2809.
王娜, 蒋林, 李远成, 朱筠. 基于图形重写和融合探索的张量虚拟机算符融合优化[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2802-2809.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023091252
模型 | 融合前层数 | TVM算符融合后的层数 | 层数减少 百分比/% |
---|---|---|---|
SqueezeNet | 10 | 9 | 10.0 |
VGG19 | 19 | 17 | 10.5 |
MobileNet | 53 | 47 | 11.3 |
Inception v4 | 76 | 66 | 13.1 |
ResNet | 152 | 128 | 15.7 |
DenseNet | 264 | 218 | 17.4 |
Tab. 1 Comparison of number of neural network layers before and after fusion of different models
模型 | 融合前层数 | TVM算符融合后的层数 | 层数减少 百分比/% |
---|---|---|---|
SqueezeNet | 10 | 9 | 10.0 |
VGG19 | 19 | 17 | 10.5 |
MobileNet | 53 | 47 | 11.3 |
Inception v4 | 76 | 66 | 13.1 |
ResNet | 152 | 128 | 15.7 |
DenseNet | 264 | 218 | 17.4 |
其他算符类型 | 第1个算符类型 | ||
---|---|---|---|
一对一 | 一对多 | 多对多 | |
一对一 | 一对一 | 一对多 | 多对多 |
一对多 | 一对多 | 一对多 | 多对多 |
多对多 | 多对多 | 多对多 | — |
Tab. 2 Candidate block mapping types
其他算符类型 | 第1个算符类型 | ||
---|---|---|---|
一对一 | 一对多 | 多对多 | |
一对一 | 一对一 | 一对多 | 多对多 |
一对多 | 一对多 | 一对多 | 多对多 |
多对多 | 多对多 | 多对多 | — |
方法 | 图形重写前 | 图形重写后 | ||
---|---|---|---|---|
计算图结构 | 计算量 | 计算图结构 | 计算量 | |
结合 | Recip( B )⊙ A ⊙Recip( A ⊙ B ) | 4m×n | Recip(Square( B )) | 2m×n |
( | 5m×n | A ⊙ B ⊙ C | 2m×n | |
( A ⊙( B ⊙ C ))-1⊙( B ⊙ C )-1 | 6m×n | A-1⊙( B ⊙ C )-2 | 4m×n | |
分配 | ( A·B )⊙C+( A·B )⊙ D | 5m×n | ( A·B )⊙( C + D ) | 3m×n |
A ⊙ B - A ⊙ C | 3m×n | A ⊙( B - C ) | 2m×n | |
Square( A - B )+( A - B )⊙ C | 5m×n | ( A - B )⊙( A-B+C ) | 4m×n | |
交换 | A+B+C | 2m×n | C + B + A | 2m×n |
A⊙B | m×n | B ⊙ A | m×n | |
ReduceSum(BitShift( A )) | 2m×n | BitShift(ReduceSum( A )) | m×n+m |
Tab. 3 Comparison of computational complexity of operators before and after graph rewriting
方法 | 图形重写前 | 图形重写后 | ||
---|---|---|---|---|
计算图结构 | 计算量 | 计算图结构 | 计算量 | |
结合 | Recip( B )⊙ A ⊙Recip( A ⊙ B ) | 4m×n | Recip(Square( B )) | 2m×n |
( | 5m×n | A ⊙ B ⊙ C | 2m×n | |
( A ⊙( B ⊙ C ))-1⊙( B ⊙ C )-1 | 6m×n | A-1⊙( B ⊙ C )-2 | 4m×n | |
分配 | ( A·B )⊙C+( A·B )⊙ D | 5m×n | ( A·B )⊙( C + D ) | 3m×n |
A ⊙ B - A ⊙ C | 3m×n | A ⊙( B - C ) | 2m×n | |
Square( A - B )+( A - B )⊙ C | 5m×n | ( A - B )⊙( A-B+C ) | 4m×n | |
交换 | A+B+C | 2m×n | C + B + A | 2m×n |
A⊙B | m×n | B ⊙ A | m×n | |
ReduceSum(BitShift( A )) | 2m×n | BitShift(ReduceSum( A )) | m×n+m |
方法 | 图形重写前 | 图形重写后 | ||
---|---|---|---|---|
计算图结构 | 计算量 | 计算图结构 | 计算量 | |
交换、分配 | A·B + A·C + A·B | 5m×n | A· ( B+C+A ) | 3m×n |
Mul( A·B)+A·C | 4m×n | |||
结合、分配 | ( | 9m×n | 6m×n | |
B·C· ( A + | 4m×n | |||
交换、结合 | Recip( A ) ·B ·Recip( B ) ·C· Recip( C )+ Recip(Square( A )) ·B·C· Recip( B ) ·C | 11m×n | Recip( A )+ Recip(Square( A ))·Square( C ) | 4m×n |
Recip( A ) ·B·Recip( B ) ·C· (Recip( C )+Recip( A ) ·C ) | 7m×n |
Tab. 4 Comparison of computational complexity after graph rewriting using different rewriting rules
方法 | 图形重写前 | 图形重写后 | ||
---|---|---|---|---|
计算图结构 | 计算量 | 计算图结构 | 计算量 | |
交换、分配 | A·B + A·C + A·B | 5m×n | A· ( B+C+A ) | 3m×n |
Mul( A·B)+A·C | 4m×n | |||
结合、分配 | ( | 9m×n | 6m×n | |
B·C· ( A + | 4m×n | |||
交换、结合 | Recip( A ) ·B ·Recip( B ) ·C· Recip( C )+ Recip(Square( A )) ·B·C· Recip( B ) ·C | 11m×n | Recip( A )+ Recip(Square( A ))·Square( C ) | 4m×n |
Recip( A ) ·B·Recip( B ) ·C· (Recip( C )+Recip( A ) ·C ) | 7m×n |
1 | ZIRAKSIMA M, LOTFI S, RAZMARA J. Deep reinforcement learning in loop fusion problem [J]. Neurocomputing, 2022, 481: 102-120. |
2 | ZHENG G, LI J, GAO W, et al. Operator fusion scheduling optimization for TVM deep learning compilers [C]// Proceedings of the 3rd International Symposium on Computer Technology and Information Science. Piscataway: IEEE, 2023: 273-277. |
3 | JAYATILAKA T, UENO H, GEORGAKOUDIS G, et al. Towards compile-time-reducing compiler optimization selection via machine learning [C]// Proceedings of the 50th International Conference on Parallel Processing Workshop. New York: ACM, 2021: No.23. |
4 | BONDHUGULA U, GUNLUK O, DASH S, et al. A model for fusion and code motion in an automatic parallelizing compiler [C]// Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. New York: ACM, 2010: 343-352. |
5 | ALAMRI N M H, PACKIANATHER M, BIGOT S. Optimization of convolutional neural network topology and training parameters using bees algorithm [C]// Proceedings of the IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security. Piscataway: IEEE, 2022: 1-6. |
6 | ZHAO M, HU M, LI M, et al. A novel fusion pruning algorithm based on information entropy stratification and IoT application [J]. Electronics, 2022, 11(8): No.1212. |
7 | DONG P, WANG S, NIU W, et al. RTMobile: beyond real-time mobile acceleration of RNNs for speech recognition [C]// Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference. Piscataway: IEEE, 2020: 1-6. |
8 | 胡世杰. 基于多分区方法的神经网络计算图优化研究[D]. 合肥:合肥工业大学, 2022:1-8. |
HU S J. Neural network based multi-partition method computational graph optimization [D]. Hefei: Hefei University of Technology, 2022:1-8. | |
9 | CAI X, WANG Y, ZHANG L. Optimus: an operator fusion framework for deep neural networks [J]. ACM Transactions on Embedded Computing Systems, 2023, 22(1): No.1. |
10 | CHEN J, XU N, CHEN P, et al. Efficient compiler autotuning via Bayesian optimization [C]// Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. Piscataway: IEEE, 2021:1198-1209. |
11 | ZHANG J, YU X, LEI X, et al. A multi-feature fusion model based on denoising convolutional neural network and attention mechanism for image classification [J]. International Journal of Swarm Intelligence Research, 2023, 14(2): 1-15. |
12 | ZHENG S, CHEN S, SONG P, et al. Chimera: an analytical optimizing framework for effective compute-intensive operators fusion [C]// Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture. Piscataway: IEEE, 2023:1113-1126. |
13 | 郭子博,高瑛珂,胡航天,等. 基于混合架构的卷积神经网络算法加速研究[J]. 计算机工程与应用, 2022, 58(6):88-94. |
GUO Z B, GAO Y K, HU H T, et al. Research on acceleration of convolutional neural network algorithm based on hybrid architecture [J]. Computer Engineering and Applications, 2022, 58(6): 88-94. | |
14 | YIN J, ZHANG X. An optimization toolchain design of deep learning deployment based on heterogeneous computing platform[C]// Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing. Piscataway: IEEE, 2020: 631-635. |
15 | GUNNARSSON B R, VANDEN BROUCKE S, DE WEERDT J. A direct data aware LSTM neural network architecture for complete remaining trace and runtime prediction [J]. IEEE Transactions on Services Computing, 2023, 16(4): 2330-2342. |
16 | SHUBHA S S, SHEN H. Trustworthy distributed deep neural network training in an edge device network [C]// Proceedings of the 2022 IEEE International Conference on Big Data. Piscataway: IEEE, 2022: 1570-1575. |
17 | SENJALIYA H, GAJJAR P, DODIA V, et al. A comparative study on the modern deep learning architectures for predicting nutritional deficiency in rice plants [C]// Proceedings of the 2023 IEEE IAS Global Conference on Emerging Technologies. Piscataway: IEEE, 2023: 1-6. |
18 | 李茂文,曲国远,魏大洲,等. 面向GPU计算平台的神经网络卷积性能优化[J]. 计算机研究与发展, 2022, 59(6):1181-1191. |
LI M W, QU G Y, WEI D Z, et al. Performance optimization of neural network convolution based on GPU platform[J]. Journal of Computer Research and Development, 2022, 59(6): 1181-1191. | |
19 | LIAO L, LI H, SHANG W, et al. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks [J]. ACM Transactions on Software Engineering and Methodology, 2022, 31(3): No.53. |
20 | YI X, ZHANG S, DIAO L, et al. Optimizing DNN compilation for distributed training with joint OP and tensor fusion [J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(12): 4694-4706. |
21 | ZHAO J, GAO X, XIA R, et al. Apollo: automatic partition-based operator fusion through layer by layer optimization [EB/OL]. [2023-07-11]. . |
22 | KIM S, WIMMER H, KIM J. Analysis of deep learning libraries: Keras, PyTorch, and MXNet [C]// Proceedings of the IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications. Piscataway: IEEE, 2022: 54-62. |
23 | TONG G, YAN R, YANG L, et al. Optimizing Winograd convolution on GPUs via partial kernel fusion [C]// Proceedings of the 2022 IFIP International Conference on Network and Parallel Computing, LNCS 13615. Cham: Springer, 2022: 17-29. |
[1] | Guanglei YAO, Juxia XIONG, Guowu YANG. Flower pollination algorithm based on neural network optimization [J]. Journal of Computer Applications, 2024, 44(9): 2829-2837. |
[2] | Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885. |
[3] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[4] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[5] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
[6] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[7] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[8] | Rui ZHANG, Pengyun ZHANG, Meirong GAO. Self-optimized dual-modal multi-channel non-deep vestibular schwannoma recognition model [J]. Journal of Computer Applications, 2024, 44(9): 2975-2982. |
[9] | Jinjin LI, Guoming SANG, Yijia ZHANG. Multi-domain fake news detection model enhanced by APK-CNN and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2674-2682. |
[10] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[11] | Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429. |
[12] | Zheyuan SHEN, Keke YANG, Jing LI. Personalized federated learning method based on dual stream neural network [J]. Journal of Computer Applications, 2024, 44(8): 2319-2325. |
[13] | Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371. |
[14] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
[15] | Rui SHI, Yong LI, Yanhan ZHU. Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient [J]. Journal of Computer Applications, 2024, 44(8): 2521-2527. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||