《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (8): 2646-2655.DOI: 10.11772/j.issn.1001-9081.2024081092
• 先进计算 • 上一篇
收稿日期:
2024-08-05
修回日期:
2024-10-20
接受日期:
2024-10-31
发布日期:
2024-11-19
出版日期:
2025-08-10
通讯作者:
陈艳
作者简介:
敬超(1983—),男,河南长葛人,教授,博士,CCF高级会员,主要研究方向:高性能计算、智能优化算法基金资助:
Chao JING1,2, Yutao QUAN1, Yan CHEN1,2()
Received:
2024-08-05
Revised:
2024-10-20
Accepted:
2024-10-31
Online:
2024-11-19
Published:
2025-08-10
Contact:
Yan CHEN
About author:
JING Chao, born in 1983, Ph. D., professor. His research interests include high-performance computing, intelligent optimization algorithms.Supported by:
摘要:
虽然异构计算系统的应用可以加快神经网络参数的处理,但系统功耗也随之剧增。良好的功耗预测方法是异构系统优化功耗和处理多类型工作负载的基础,基于此,通过改进多层感知机-注意力模型,提出一种面向CPU/GPU异构计算系统多类型工作负载的功耗预测算法。首先,考虑服务器功耗与系统特征,建立一种基于特征的工作负载功耗模型;其次,针对现有的功耗预测算法不能解决系统特征与系统功耗之间的长程依赖的问题,提出一种改进的基于多层感知机-注意力模型的功耗预测算法Prophet,该算法改进多层感知机实现各个时刻的系统特征的提取,并使用注意力机制综合这些特征,从而有效解决系统特征与系统功耗之间的长程依赖问题;最后,在实际系统中开展相关实验,将所提算法分别与MLSTM_PM (Power consumption Model based on Multi-layer Long Short-Term Memory)和ENN_PM (Power consumption Model based on Elman Neural Network)等功耗预测算法对比。实验结果表明,Prophet具有较高的预测精准性,与MLSTM_PM算法相比,在工作负载blk、memtest和busspd上将平均相对误差(MRE)分别降低了1.22、1.01和0.93个百分点,并且具有较低的复杂度,表明了所提算法的有效性及可行性。
中图分类号:
敬超, 全育涛, 陈艳. 基于多层感知机-注意力模型的功耗预测算法[J]. 计算机应用, 2025, 45(8): 2646-2655.
Chao JING, Yutao QUAN, Yan CHEN. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm[J]. Journal of Computer Applications, 2025, 45(8): 2646-2655.
系统指标 | 单位 | 描述 |
---|---|---|
CPU利用率 | % | 各CPU核心的利用率 |
CPU频率 | MHz | 各CPU核心的频率 |
占用内存 | MB | 占用的系统内存 |
网络I/O | % | 网卡利用率 |
磁盘I/O速度 | MB·s-1 | 磁盘I/O的速度 |
缓存未命中 | 所有CPU核心的缓存未命中次数之和 | |
缓存引用 | 所有CPU核心的缓存引用次数之和 | |
L1数据缓存加载 | CPU的L1数据缓存的加载次数 | |
L1数据缓存储存 | CPU的L1数据缓存的存储次数 | |
散热器转速 | % | GPU散热器的转速 |
GPU功耗状态 | GPU功耗状态 | |
GPU内存占用 | MB | GPU内存的占用量 |
GPU利用率 | % | GPU的利用率 |
PCIe发送 | MB·s-1 | PCIe传输数据的速度 |
PCIe接收 | MB·s-1 | PCIe接收数据的速度 |
GPU温度 | ℃ | GPU核心的温度 |
GPU频率 | MHz | GPU核心的频率 |
GPU内存频率 | MHz | GPU内存的频率 |
表1 面向CPU/GPU服务器系统收集的特征
Tab. 1 Collected system features for CPU/GPU servers
系统指标 | 单位 | 描述 |
---|---|---|
CPU利用率 | % | 各CPU核心的利用率 |
CPU频率 | MHz | 各CPU核心的频率 |
占用内存 | MB | 占用的系统内存 |
网络I/O | % | 网卡利用率 |
磁盘I/O速度 | MB·s-1 | 磁盘I/O的速度 |
缓存未命中 | 所有CPU核心的缓存未命中次数之和 | |
缓存引用 | 所有CPU核心的缓存引用次数之和 | |
L1数据缓存加载 | CPU的L1数据缓存的加载次数 | |
L1数据缓存储存 | CPU的L1数据缓存的存储次数 | |
散热器转速 | % | GPU散热器的转速 |
GPU功耗状态 | GPU功耗状态 | |
GPU内存占用 | MB | GPU内存的占用量 |
GPU利用率 | % | GPU的利用率 |
PCIe发送 | MB·s-1 | PCIe传输数据的速度 |
PCIe接收 | MB·s-1 | PCIe接收数据的速度 |
GPU温度 | ℃ | GPU核心的温度 |
GPU频率 | MHz | GPU核心的频率 |
GPU内存频率 | MHz | GPU内存的频率 |
超参数 | 设置值 | 超参数 | 设置值 |
---|---|---|---|
嵌入维度 | 16 | 优化器 | Adam |
学习率 | 1×10-4 | 时间窗口大小 | 8 |
损失函数 | mse | 权重衰减 | 1×10-3 |
批尺寸 | 32 | 提前停止阈值 | 150 |
表2 超参数设置
Tab. 2 Hyperparameter setting
超参数 | 设置值 | 超参数 | 设置值 |
---|---|---|---|
嵌入维度 | 16 | 优化器 | Adam |
学习率 | 1×10-4 | 时间窗口大小 | 8 |
损失函数 | mse | 权重衰减 | 1×10-3 |
批尺寸 | 32 | 提前停止阈值 | 150 |
算法 | 不同工作负载下的MAE | ||||
---|---|---|---|---|---|
oneDNN | gemm | blk | memtest | busspd | |
ENN_PM | 20.9 | 10.9 | 5.62 | 3.83 | |
MLSTM_PM | 47.8 | 11.2 | 9.78 | 4.22 | 6.06 |
CBLA_PM | 18.8 | 6.49 | 3.88 | 5.30 | |
MLR_PM | 31.0 | 6.8 | 2.85 | ||
Prophet | 23.0 | 5.54 | 2.55 | 3.23 |
表3 各算法在不同工作负载下的预测MAE (W)
Tab. 3 Prediction MAE of each algorithm under different workload
算法 | 不同工作负载下的MAE | ||||
---|---|---|---|---|---|
oneDNN | gemm | blk | memtest | busspd | |
ENN_PM | 20.9 | 10.9 | 5.62 | 3.83 | |
MLSTM_PM | 47.8 | 11.2 | 9.78 | 4.22 | 6.06 |
CBLA_PM | 18.8 | 6.49 | 3.88 | 5.30 | |
MLR_PM | 31.0 | 6.8 | 2.85 | ||
Prophet | 23.0 | 5.54 | 2.55 | 3.23 |
算法 | 不同工作负载下的MRE | ||||
---|---|---|---|---|---|
oneDNN | gemm | blk | memtest | busspd | |
ENN_PM | 8.44 | 1.55 | |||
MLSTM_PM | 16.80 | 3.82 | 2.88 | 2.65 | 2.16 |
CBLA_PM | 6.31 | 2.05 | 2.31 | 2.46 | |
MLR_PM | 16.60 | 2.76 | 1.96 | ||
Prophet | 10.50 | 4.01 | 1.66 | 1.64 | 1.23 |
表4 各算法在不同工作负载下的预测MRE (%)
Tab. 4 Prediction MRE of each algorithm under different workload
算法 | 不同工作负载下的MRE | ||||
---|---|---|---|---|---|
oneDNN | gemm | blk | memtest | busspd | |
ENN_PM | 8.44 | 1.55 | |||
MLSTM_PM | 16.80 | 3.82 | 2.88 | 2.65 | 2.16 |
CBLA_PM | 6.31 | 2.05 | 2.31 | 2.46 | |
MLR_PM | 16.60 | 2.76 | 1.96 | ||
Prophet | 10.50 | 4.01 | 1.66 | 1.64 | 1.23 |
预测算法 | 训练开销 | 推理开销 |
---|---|---|
ENN_PM | 91.7 | |
MLSTM_PM | 119.1 | |
CBLA_PM | 117.2 | |
Prophet | 153.7 |
表5 各算法的时间开销 (s)
Tab. 5 Time overhead of each algorithm
预测算法 | 训练开销 | 推理开销 |
---|---|---|
ENN_PM | 91.7 | |
MLSTM_PM | 119.1 | |
CBLA_PM | 117.2 | |
Prophet | 153.7 |
[1] | OpenAI. GPT-4 technical report[R/OL]. [2024-04-15].. |
[2] | ESSER P, KULAL S, BLATTMANN A, et al. Scaling rectified flow transformers for high-resolution image synthesis[C]// Proceedings of the 41st International Conference on Machine Learning. New York: JMLR.org, 2024: 12606-12633. |
[3] | ZHU Z, WANG X, ZHAO W, et al. Is Sora a world simulator? a comprehensive survey on general world models and beyond[EB/OL]. [2024-06-23].. |
[4] | STRUBELL E, GANESH A, McCALLUM A. Energy and policy considerations for deep learning in NLP[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 3645-3650. |
[5] | CHU W X, WANG C C. A review on airflow management in data centers[J]. Applied Energy, 2019, 240: 84-119. |
[6] | TATCHELL-EVANS M, KAPUR N, SUMMERS J, et al. An experimental and theoretical investigation of the extent of bypass air within data centres employing aisle containment, and its impact on power consumption[J]. Applied Energy, 2017, 186(Pt 3): 457-469. |
[7] | MAO J, PENG X, CAO T, et al. A frequency-aware management strategy for virtual machines in DVFS-enabled clouds[J]. Sustainable Computing: Informatics and Systems, 2022, 33: No.100643. |
[8] | CHOU C H, BHUYAN L N, WONG D. μDPM: dynamic power management for the microsecond era[C]// Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture. Piscataway: IEEE, 2019: 120-132. |
[9] | AGRAWAL S. A lazy DVS approach for dynamic real time system[J]. ACM SIGBED Review, 2016, 13(4): 7-12. |
[10] | ZHU P, LUO D, CHEN X. Fault-tolerant and power-aware scheduling in embedded real-time systems[C]// Proceedings of the 2020 International Conference on Computer, Information and Telecommunication Systems. Piscataway: IEEE, 2020: 1-5. |
[11] | PAUL R, DANELUTTO M. Power aware scheduling of tasks on FPGAs in data centers[C]// Proceedings of the 32nd Euromicro International Conference on Parallel, Distributed and Network-based Processing. Piscataway: IEEE, 2024: 148-152. |
[12] | LIN W, WU G, WANG X, et al. An artificial neural network approach to power consumption model construction for servers in cloud data centers[J]. IEEE Transactions on Sustainable Computing, 2020, 5(3): 329-340. |
[13] | JING C, LI J. CBLA_PM: an improved ANN-based power consumption prediction algorithm for multi-type jobs on heterogeneous computing server[J]. Cluster Computing, 2023, 27(1): 377-394. |
[14] | LI C, ZHU D, HU C, et al. ECDX: energy consumption prediction model based on distance correlation and XGBoost for edge data center[J]. Information Sciences, 2023, 643: No.119218. |
[15] | LIN W, YU T, GAO C, et al. A hardware-aware CPU power measurement based on the power-exponent function model for cloud servers[J]. Information Sciences, 2021, 547: 1045-1065. |
[16] | 王海,高岭,宋振孝,等. 基于GINI指数分类的嵌入式CPU功耗预测方法[J]. 计算机学报, 2015, 38(2): 397-407. |
WANG H, GAO L, SONG Z X, et al. A method of the power consumption prediction of embedded CPU based on GINI index classification method[J]. Chinese Journal of Computers, 2015, 38(2): 397-407. | |
[17] | 刘辛,沈立,苏博,等. 多核处理器的功耗估算模型[J]. 软件学报, 2015, 26(7): 1840-1852. |
LIU X, SHEN L, SU B, et al. Power estimation model on multi-core platforms[J]. Journal of Software, 2015, 26(7): 1840-1852. | |
[18] | GHOSH S, CHANDRASEKARAN S, CHAPMAN B. Statistical modeling of power/energy of scientific kernels on a multi-GPU system[C]// Proceedings of the 2013 International Green Computing Conference. Piscataway: IEEE, 2013: 1-6. |
[19] | SAGI M, DOAN N A V, RAPP M, et al. A lightweight nonlinear methodology to accurately model multicore processor power[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(11): 3152-3164. |
[20] | 李伟,郎俊豪,陈韬,等. 基于Amdahl定律的异构多核密码处理器能效模型研究[J]. 电子学报, 2024, 52(3): 849-862. |
LI W, LANG J H, CHEN T, et al. Amdahl’s law-based energy-efficient model for heterogeneous multicore crypto-processor[J]. Acta Electronica Sinica, 2024, 52(3): 849-862. | |
[21] | HEINRICH F C, CORNEBIZE T, DEGOMME A, et al. Predicting the energy-consumption of MPI applications at scale using only a single node[C]// Proceedings of the 2017 IEEE International Conference on Cluster Computing. Piscataway: IEEE, 2017: 92-102. |
[22] | DUAN L, ZHAN D, HOHNERLEIN J. Optimizing cloud data center energy efficiency via dynamic prediction of CPU idle intervals[C]// Proceedings of the IEEE 8th International Conference on Cloud Computing. Piscataway: IEEE, 2015: 985-988. |
[23] | WU W, LIN W, HE L, et al. A power consumption model for cloud servers based on Elman neural network[J]. IEEE Transactions on Cloud Computing, 2021, 9(4): 1268-1277. |
[24] | ZHANG X, SHEN Z, XIA B, et al. Estimating power consumption of containers and virtual machines in data centers[C]// Proceedings of the 2020 IEEE International Conference on Cluster Computing. Piscataway: IEEE, 2020: 288-293. |
[25] | CHAUDHARI P J, KANEKO S, OKAMURA T. Estimating power consumption of collocated workloads in a real-world data center[C]// Proceedings of the 2023 International Conference on Software, Telecommunications and Computer Networks. Piscataway: IEEE, 2023: 1-7. |
[26] | ZHOU Z, SHOJAFAR M, ALAZAB M, et al. IECL: an intelligent energy consumption model for cloud manufacturing[J]. IEEE Transactions on Industrial Informatics, 2022, 18(12): 8967-8976. |
[27] | SHEN Z, ZHANG X, LIU Z, et al. PM-VE: power metering model for virtualization environments in cloud data centers[J]. IEEE Transactions on Cloud Computing, 2023, 11(3): 3126-3138. |
[28] | SHEN Z, LIU B, ZHOU Q, et al. Cost-sensitive tensor-based dual-stage attention LSTM with feature selection for data center server power forecasting[J]. ACM Transactions on Intelligent Systems and Technology, 2023, 14(2): No.24. |
[29] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
[30] | Unified Acceleration (UXL) Foundation. oneAPI Deep Neural Network Library (oneDNN)[CP/OL]. [2024-04-06].. |
[31] | HU B, ROSSBACH C J. Altis: modernizing GPGPU Benchmarks[C]// Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software. Piscataway: IEEE, 2020: 1-11. |
[32] | Linux Kernel Organization. The Linux kernel archives[EB/OL]. [2024-04-06].. |
[33] | CAZABON C. Memtester version 4[CP/OL]. [2024-06-01].. |
[34] | FFmpeg. FFmpeg[EB/OL]. [2024-06-10].. |
[35] | Blender[EB/OL]. [2024-06-10].. |
[36] | Open CV: open source computer vision library[DB/OL]. [2024-06-10].. |
[1] | 张硕, 孙国凯, 庄园, 冯小雨, 王敬之. 面向区块链节点分析的eclipse攻击动态检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2428-2436. |
[2] | 吴海峰, 陶丽青, 程玉胜. 集成特征注意力和残差连接的偏标签回归算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2530-2536. |
[3] | 刘皓宇, 孔鹏伟, 王耀力, 常青. 基于多视角信息的行人检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2325-2332. |
[4] | 赵小强, 柳勇勇, 惠永永, 刘凯. 基于改进时域卷积网络与多头自注意力机制的间歇过程质量预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2245-2252. |
[5] | 王慧斌, 胡展傲, 胡节, 徐袁伟, 文博. 基于分段注意力机制的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2262-2268. |
[6] | 梁辰, 王奕森, 魏强, 杜江. 基于Tsransformer-GCN的源代码漏洞检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2296-2303. |
[7] | 王艺涵, 路翀, 陈忠源. 跨模态文本信息增强的多模态情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2237-2244. |
[8] | 颜文婧, 王瑞东, 左敏, 张青川. 基于风味嵌入异构图层次学习的食谱推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1869-1878. |
[9] | 王海杰, 张广鑫, 史海, 陈树. 基于实体表示增强的文档级关系抽取[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1809-1816. |
[10] | 翟社平, 黄妍, 杨晴, 杨锐. 融合三元组和文本属性的多视图实体对齐[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1793-1800. |
[11] | 王向, 崔倩倩, 张晓明, 王建超, 王震洲, 宋佳霖. 改进ConvNeXt的无线胶囊内镜图像分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 2016-2024. |
[12] | 宋源, 陈锌, 李亚荣, 李永伟, 刘扬, 赵振. 基于听觉调制孪生网络的单通道语音分离模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 2025-2033. |
[13] | 李维刚, 李歆怡, 王永强, 赵云涛. 基于自适应动态图卷积和无参注意力的点云分类分割方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1980-1986. |
[14] | 李慧, 贾炳志, 王晨曦, 董子宇, 李纪龙, 仲兆满, 陈艳艳. 基于Swin Transformer的生成对抗网络水下图像增强模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1439-1446. |
[15] | 王丹, 张文豪, 彭丽娟. 基于深度学习的智能反射面辅助通信系统信道估计[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1613-1618. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||