Improved multi-layer perceptron and attention model-based power consumption prediction algorithm

doi:10.11772/j.issn.1001-9081.2024081092

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2646-2655.DOI: 10.11772/j.issn.1001-9081.2024081092

• Advanced computing • Previous Articles

Improved multi-layer perceptron and attention model-based power consumption prediction algorithm

Chao JING¹^,², Yutao QUAN¹, Yan CHEN¹^,²()

^1.College of Computer Science and Engineering，Guilin University of Technology，Guilin Guangxi 541006，China
^2.Guangxi Key Laboratory of Embedded Technology and Intelligent System （Guilin University of Technology），Guilin Guangxi 541006，China

Received:2024-08-05 Revised:2024-10-20 Accepted:2024-10-31 Online:2024-11-19 Published:2025-08-10
Contact: Yan CHEN
About author:JING Chao， born in 1983， Ph. D.， professor. His research interests include high-performance computing， intelligent optimization algorithms.
QUAN Yutao， born in 2000， M. S. candidate. His research interests include high-performance computing.
Supported by:
National Natural Science Foundation of China(62362018)

基于多层感知机-注意力模型的功耗预测算法

敬超¹^,², 全育涛¹, 陈艳¹^,²()

^1.桂林理工大学计算机科学与工程学院，广西桂林 541006
^2.广西嵌入式技术与智能系统重点实验室（桂林理工大学），广西桂林 541006

通讯作者: 陈艳
作者简介:敬超（1983—），男，河南长葛人，教授，博士，CCF高级会员，主要研究方向：高性能计算、智能优化算法
全育涛（2000—），男，湖南衡阳人，硕士研究生，主要研究方向：高性能计算
基金资助:
国家自然科学基金资助项目(62362018);广西重点研发计划项目(桂科AB23075116)

Abstract

Abstract:

Although the use of heterogeneous computing systems can accelerate the processing of neural network parameters， it also increases system power consumption significantly. Good power consumption prediction methods are fundamental for optimizing power consumption in heterogeneous systems and handling multi-type workloads. Based on the above， by improving multi-layer perceptron and attention model， a power consumption prediction algorithm was proposed for CPU/GPU heterogeneous computing systems with multi-type workloads. Firstly， considering server power consumption and system features， a workload power consumption model based on features was established. Then， to address the issue that the existing power consumption prediction algorithms cannot solve long-range dependence between system features and system power consumption， an improved power consumption prediction algorithm based on multi-layer perceptron-attention model was proposed， namely Prophet. In the algorithm， the multi-layer perceptron was modified to extract system features at different moments， and the attention mechanism was employed to synthesize these features， so that the long-range dependency problem between system features and power consumption was solved effectively. Finally， the experiments were conducted on real heterogeneous systems， and the proposed algorithm was compared with the power consumption prediction algorithms such as MLSTM_PM （Power consumption Model based on Multi-layer Long Short-Term Memory） and ENN_PM （Power consumption Model based on Elman Neural Network）. Experimental results show that Prophet achieves higher prediction accuracy， reducing the Mean Relative Error （MRE） for workloads blk， memtest， and busspd by 1.22， 1.01， and 0.93 percentage points， respectively， compared to MLSTM_PM， and has low complexity， indicating the proposed algorithm’s effectiveness and feasibility.

Key words: heterogenous computing system, workload feature, Multi-Layer Perceptron (MLP), attention mechanism, power consumption prediction

摘要：

虽然异构计算系统的应用可以加快神经网络参数的处理，但系统功耗也随之剧增。良好的功耗预测方法是异构系统优化功耗和处理多类型工作负载的基础，基于此，通过改进多层感知机-注意力模型，提出一种面向CPU/GPU异构计算系统多类型工作负载的功耗预测算法。首先，考虑服务器功耗与系统特征，建立一种基于特征的工作负载功耗模型；其次，针对现有的功耗预测算法不能解决系统特征与系统功耗之间的长程依赖的问题，提出一种改进的基于多层感知机-注意力模型的功耗预测算法Prophet，该算法改进多层感知机实现各个时刻的系统特征的提取，并使用注意力机制综合这些特征，从而有效解决系统特征与系统功耗之间的长程依赖问题；最后，在实际系统中开展相关实验，将所提算法分别与MLSTM_PM （Power consumption Model based on Multi-layer Long Short-Term Memory）和ENN_PM （Power consumption Model based on Elman Neural Network）等功耗预测算法对比。实验结果表明，Prophet具有较高的预测精准性，与MLSTM_PM算法相比，在工作负载blk、memtest和busspd上将平均相对误差（MRE）分别降低了1.22、1.01和0.93个百分点，并且具有较低的复杂度，表明了所提算法的有效性及可行性。

关键词: 异构计算系统, 负载特征, 多层感知机, 注意力机制, 功耗预测

CLC Number:

TP303

Chao JING, Yutao QUAN, Yan CHEN. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm[J]. Journal of Computer Applications, 2025, 45(8): 2646-2655.

敬超, 全育涛, 陈艳. 基于多层感知机-注意力模型的功耗预测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2646-2655.

Figures/Tables 15

Fig. 1 Core modules of Prophet algorithm

Tab. 1 Collected system features for CPU/GPU servers

系统指标	单位	描述
CPU利用率	%	各CPU核心的利用率
CPU频率	MHz	各CPU核心的频率
占用内存	MB	占用的系统内存
网络I/O	%	网卡利用率
磁盘I/O速度	MB·s^-1	磁盘I/O的速度
缓存未命中		所有CPU核心的缓存未命中次数之和
缓存引用		所有CPU核心的缓存引用次数之和
L1数据缓存加载		CPU的L1数据缓存的加载次数
L1数据缓存储存		CPU的L1数据缓存的存储次数
散热器转速	%	GPU散热器的转速
GPU功耗状态		GPU功耗状态
GPU内存占用	MB	GPU内存的占用量
GPU利用率	%	GPU的利用率
PCIe发送	MB·s^-1	PCIe传输数据的速度
PCIe接收	MB·s^-1	PCIe接收数据的速度
GPU温度	℃	GPU核心的温度
GPU频率	MHz	GPU核心的频率
GPU内存频率	MHz	GPU内存的频率

Fig. 2 Overall structure of power consumption prediction model

Tab. 2 Hyperparameter setting

超参数	设置值	超参数	设置值
嵌入维度	16	优化器	Adam
学习率	1×10^-4	时间窗口大小	8
损失函数	mse	权重衰减	1×10^-3
批尺寸	32	提前停止阈值	150

Tab. 3 Prediction MAE of each algorithm under different workload

算法	不同工作负载下的MAE
算法	oneDNN	gemm	blk	memtest	busspd
ENN_PM	20.9	10.9	5.62	2.76	3.83
MLSTM_PM	47.8	11.2	9.78	4.22	6.06
CBLA_PM	22.7	18.8	6.49	3.88	5.30
MLR_PM	31.0	6.8	6.44	2.85	3.72
Prophet	23.0	10.0	5.54	2.55	3.23

Tab. 4 Prediction MRE of each algorithm under different workload

算法	不同工作负载下的MRE
算法	oneDNN	gemm	blk	memtest	busspd
ENN_PM	8.44	3.72	1.68	1.80	1.55
MLSTM_PM	16.80	3.82	2.88	2.65	2.16
CBLA_PM	10.10	6.31	2.05	2.31	2.46
MLR_PM	16.60	2.76	1.96	1.80	1.37
Prophet	10.50	4.01	1.66	1.64	1.23

Fig. 3 Power consumption and prediction curve of each workload

Fig. 4 Prediction MAE for different number of encoder modules

Fig. 5 Performance of each algorithm under different prediction time lengths

Tab. 5 Time overhead of each algorithm

预测算法	训练开销	推理开销
ENN_PM	91.7	$5.9 × 10 - 4$
MLSTM_PM	119.1	$1.6 × 10 - 3$
CBLA_PM	117.2	$1.1 × 10 - 3$
Prophet	153.7	$1.2 × 10 - 3$

Tab. 5 Time overhead of each algorithm

预测算法	训练开销	推理开销
ENN_PM	91.7	$5.9 × 10 - 4$
MLSTM_PM	119.1	$1.6 × 10 - 3$
CBLA_PM	117.2	$1.1 × 10 - 3$
Prophet	153.7	$1.2 × 10 - 3$

Fig. 6 Relationship between partial features of workload oneDNN and power consumption

Fig. 7 Relationship between partial features of workload blk and power consumption

Fig. 8 Relationship between partial features of workload gemm and power consumption

Fig. 9 Prediction performance under different workloads

Fig. 10 Prediction performance under different hardware setups

References 36

[1]	OpenAI. GPT-4 technical report［R/OL］. ［2024-04-15］..
[2]	ESSER P， KULAL S， BLATTMANN A， et al. Scaling rectified flow transformers for high-resolution image synthesis［C］// Proceedings of the 41st International Conference on Machine Learning. New York： JMLR.org， 2024： 12606-12633.
[3]	ZHU Z， WANG X， ZHAO W， et al. Is Sora a world simulator？ a comprehensive survey on general world models and beyond［EB/OL］. ［2024-06-23］..
[4]	STRUBELL E， GANESH A， McCALLUM A. Energy and policy considerations for deep learning in NLP［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 3645-3650.
[5]	CHU W X， WANG C C. A review on airflow management in data centers［J］. Applied Energy， 2019， 240： 84-119.
[6]	TATCHELL-EVANS M， KAPUR N， SUMMERS J， et al. An experimental and theoretical investigation of the extent of bypass air within data centres employing aisle containment， and its impact on power consumption［J］. Applied Energy， 2017， 186（Pt 3）： 457-469.
[7]	MAO J， PENG X， CAO T， et al. A frequency-aware management strategy for virtual machines in DVFS-enabled clouds［J］. Sustainable Computing： Informatics and Systems， 2022， 33： No.100643.
[8]	CHOU C H， BHUYAN L N， WONG D. μDPM： dynamic power management for the microsecond era［C］// Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture. Piscataway： IEEE， 2019： 120-132.
[9]	AGRAWAL S. A lazy DVS approach for dynamic real time system［J］. ACM SIGBED Review， 2016， 13（4）： 7-12.
[10]	ZHU P， LUO D， CHEN X. Fault-tolerant and power-aware scheduling in embedded real-time systems［C］// Proceedings of the 2020 International Conference on Computer， Information and Telecommunication Systems. Piscataway： IEEE， 2020： 1-5.
[11]	PAUL R， DANELUTTO M. Power aware scheduling of tasks on FPGAs in data centers［C］// Proceedings of the 32nd Euromicro International Conference on Parallel， Distributed and Network-based Processing. Piscataway： IEEE， 2024： 148-152.
[12]	LIN W， WU G， WANG X， et al. An artificial neural network approach to power consumption model construction for servers in cloud data centers［J］. IEEE Transactions on Sustainable Computing， 2020， 5（3）： 329-340.
[13]	JING C， LI J. CBLA_PM： an improved ANN-based power consumption prediction algorithm for multi-type jobs on heterogeneous computing server［J］. Cluster Computing， 2023， 27（1）： 377-394.
[14]	LI C， ZHU D， HU C， et al. ECDX： energy consumption prediction model based on distance correlation and XGBoost for edge data center［J］. Information Sciences， 2023， 643： No.119218.
[15]	LIN W， YU T， GAO C， et al. A hardware-aware CPU power measurement based on the power-exponent function model for cloud servers［J］. Information Sciences， 2021， 547： 1045-1065.
[16]	王海，高岭，宋振孝，等. 基于GINI指数分类的嵌入式CPU功耗预测方法［J］. 计算机学报， 2015， 38（2）： 397-407.
	WANG H， GAO L， SONG Z X， et al. A method of the power consumption prediction of embedded CPU based on GINI index classification method［J］. Chinese Journal of Computers， 2015， 38（2）： 397-407.
[17]	刘辛，沈立，苏博，等. 多核处理器的功耗估算模型［J］. 软件学报， 2015， 26（7）： 1840-1852.
	LIU X， SHEN L， SU B， et al. Power estimation model on multi-core platforms［J］. Journal of Software， 2015， 26（7）： 1840-1852.
[18]	GHOSH S， CHANDRASEKARAN S， CHAPMAN B. Statistical modeling of power/energy of scientific kernels on a multi-GPU system［C］// Proceedings of the 2013 International Green Computing Conference. Piscataway： IEEE， 2013： 1-6.
[19]	SAGI M， DOAN N A V， RAPP M， et al. A lightweight nonlinear methodology to accurately model multicore processor power［J］. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems， 2020， 39（11）： 3152-3164.
[20]	李伟，郎俊豪，陈韬，等. 基于Amdahl定律的异构多核密码处理器能效模型研究［J］. 电子学报， 2024， 52（3）： 849-862.
	LI W， LANG J H， CHEN T， et al. Amdahl’s law-based energy-efficient model for heterogeneous multicore crypto-processor［J］. Acta Electronica Sinica， 2024， 52（3）： 849-862.
[21]	HEINRICH F C， CORNEBIZE T， DEGOMME A， et al. Predicting the energy-consumption of MPI applications at scale using only a single node［C］// Proceedings of the 2017 IEEE International Conference on Cluster Computing. Piscataway： IEEE， 2017： 92-102.
[22]	DUAN L， ZHAN D， HOHNERLEIN J. Optimizing cloud data center energy efficiency via dynamic prediction of CPU idle intervals［C］// Proceedings of the IEEE 8th International Conference on Cloud Computing. Piscataway： IEEE， 2015： 985-988.
[23]	WU W， LIN W， HE L， et al. A power consumption model for cloud servers based on Elman neural network［J］. IEEE Transactions on Cloud Computing， 2021， 9（4）： 1268-1277.
[24]	ZHANG X， SHEN Z， XIA B， et al. Estimating power consumption of containers and virtual machines in data centers［C］// Proceedings of the 2020 IEEE International Conference on Cluster Computing. Piscataway： IEEE， 2020： 288-293.
[25]	CHAUDHARI P J， KANEKO S， OKAMURA T. Estimating power consumption of collocated workloads in a real-world data center［C］// Proceedings of the 2023 International Conference on Software， Telecommunications and Computer Networks. Piscataway： IEEE， 2023： 1-7.
[26]	ZHOU Z， SHOJAFAR M， ALAZAB M， et al. IECL： an intelligent energy consumption model for cloud manufacturing［J］. IEEE Transactions on Industrial Informatics， 2022， 18（12）： 8967-8976.
[27]	SHEN Z， ZHANG X， LIU Z， et al. PM-VE： power metering model for virtualization environments in cloud data centers［J］. IEEE Transactions on Cloud Computing， 2023， 11（3）： 3126-3138.
[28]	SHEN Z， LIU B， ZHOU Q， et al. Cost-sensitive tensor-based dual-stage attention LSTM with feature selection for data center server power forecasting［J］. ACM Transactions on Intelligent Systems and Technology， 2023， 14（2）： No.24.
[29]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
[30]	Unified Acceleration （UXL） Foundation. oneAPI Deep Neural Network Library （oneDNN）［CP/OL］. ［2024-04-06］..
[31]	HU B， ROSSBACH C J. Altis： modernizing GPGPU Benchmarks［C］// Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software. Piscataway： IEEE， 2020： 1-11.
[32]	Linux Kernel Organization. The Linux kernel archives［EB/OL］. ［2024-04-06］..
[33]	CAZABON C. Memtester version 4［CP/OL］. ［2024-06-01］..
[34]	FFmpeg. FFmpeg［EB/OL］. ［2024-06-10］..
[35]	Blender［EB/OL］. ［2024-06-10］..
[36]	Open CV： open source computer vision library［DB/OL］. ［2024-06-10］..

Improved multi-layer perceptron and attention model-based power consumption prediction algorithm

基于多层感知机-注意力模型的功耗预测算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 36

Related Articles 15

Recommended Articles

Metrics

[1]	Haifeng WU, Liqing TAO, Yusheng CHENG. Partial label regression algorithm integrating feature attention and residual connection [J]. Journal of Computer Applications, 2025, 45(8): 2530-2536.
[2]	Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG. Dynamic detection method of eclipse attacks for blockchain node analysis [J]. Journal of Computer Applications, 2025, 45(8): 2428-2436.
[3]	Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.
[4]	Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG. Pedestrian detection algorithm based on multi-view information [J]. Journal of Computer Applications, 2025, 45(7): 2325-2332.
[5]	Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252.
[6]	Huibin WANG, Zhan’ao HU, Jie HU, Yuanwei XU, Bo WEN. Time series forecasting model based on segmented attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2262-2268.
[7]	Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244.
[8]	Haijie WANG, Guangxin ZHANG, Hai SHI, Shu CHEN. Document-level relation extraction based on entity representation enhancement [J]. Journal of Computer Applications, 2025, 45(6): 1809-1816.
[9]	Yuan SONG, Xin CHEN, Yarong LI, Yongwei LI, Yang LIU, Zhen ZHAO. Single-channel speech separation model based on auditory modulation Siamese network [J]. Journal of Computer Applications, 2025, 45(6): 2025-2033.
[10]	Sheping ZHAI, Yan HUANG, Qing YANG, Rui YANG. Multi-view entity alignment combining triples and text attributes [J]. Journal of Computer Applications, 2025, 45(6): 1793-1800.
[11]	Xiang WANG, Qianqian CUI, Xiaoming ZHANG, Jianchao WANG, Zhenzhou WANG, Jialin SONG. Wireless capsule endoscopy image classification model based on improved ConvNeXt [J]. Journal of Computer Applications, 2025, 45(6): 2016-2024.
[12]	Weigang LI, Xinyi LI, Yongqiang WANG, Yuntao ZHAO. Point cloud classification and segmentation method based on adaptive dynamic graph convolution and parameter-free attention [J]. Journal of Computer Applications, 2025, 45(6): 1980-1986.
[13]	Dan WANG, Wenhao ZHANG, Lijuan PENG. Channel estimation of reconfigurable intelligent surface assisted communication system based on deep learning [J]. Journal of Computer Applications, 2025, 45(5): 1613-1618.
[14]	Hui LI, Bingzhi JIA, Chenxi WANG, Ziyu DONG, Jilong LI, Zhaoman ZHONG, Yanyan CHEN. Generative adversarial network underwater image enhancement model based on Swin Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1439-1446.
[15]	Man CHEN, Xiaojun YANG, Huimin YANG. Pedestrian trajectory prediction based on graph convolutional network and endpoint induction [J]. Journal of Computer Applications, 2025, 45(5): 1480-1487.