基于深度强化学习的多数据中心一体化调度优化

doi:10.11772/j.issn.1001-9081.2022050722

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1884-1892.DOI: 10.11772/j.issn.1001-9081.2022050722

所属专题：先进计算

基于深度强化学习的多数据中心一体化调度优化

方和平¹^,², 刘曙光¹(), 冉泳屹³, 钟坤华¹

^1.中国科学院重庆绿色智能技术研究院, 重庆 400714
^2.中国科学院大学重庆学院, 重庆 400714
^3.重庆邮电大学通信与信息工程学院, 重庆 400065

收稿日期:2022-05-20 修回日期:2022-08-04 接受日期:2022-08-08 发布日期:2023-06-08 出版日期:2023-06-10
通讯作者: 刘曙光
作者简介:方和平（1997—），男，重庆人，硕士研究生，CCF会员，主要研究方向：深度强化学习、绿色数据中心
刘曙光（1971—），男，山西晋城人，高级工程师，硕士，主要研究方向：人工智能、大数据Email：liushuguang@cigit.ac.cn
冉泳屹（1986—），男，四川巴中人，讲师，博士，主要研究方向：深度强化学习、通信系统
钟坤华（1984—），男，重庆人，博士研究生，主要研究方向：贝叶斯网络、人工智能。
基金资助:
中国科学院科技服务网络计划项目(KFJ-STS-QYZD-2021-01-001)

Integrated scheduling optimization of multiple data centers based on deep reinforcement learning

Heping FANG¹^,², Shuguang LIU¹(), Yongyi RAN³, Kunhua ZHONG¹

^1.Chongqing Institute of Green and Intelligent Technology，Chinese Academy of Sciences，Chongqing 400714，China
^2.Chongqing School，University of Chinese Academy of Sciences，Chongqing 400714，China
^3.School of Communication and Information Engineering，Chongqing University of Posts and Telecommunications，Chongqing 400065，China

Received:2022-05-20 Revised:2022-08-04 Accepted:2022-08-08 Online:2023-06-08 Published:2023-06-10
Contact: Shuguang LIU
About author:FANG Heping， born in 1997， M. S. candidate. His research interests include deep reinforcement learning， green data center.
RAN Yongyi， born in 1986， Ph. D.， lecturer. His research interests include deep reinforcement learning， communication system.
ZHONG Kunhua， born in 1984， Ph. D. candidate. His research interests include Bayesian network， artificial intelligence.
Supported by:
Science and Technology Service Network Program of Chinese Academy of Sciences(KFJ?STS?QYZD?2021?01?001)

摘要/Abstract

摘要：

多数据中心任务调度策略的目的是把计算任务分配到各个数据中心的不同服务器上，以促进资源利用率和能效的提升，为此提出了基于深度强化学习的多数据中心一体化调度策略。所提策略分为数据中心选择和数据中心内部任务分配两个阶段。在多数据中心选择阶段，整合算力资源以提高总体资源利用率，首先采用具有优先经验回放的深度Q网络（PER-DQN）在以数据中心为节点的网络中获取到达各个数据中心的通信路径；然后计算资源使用成本和网络通信成本，并依据这两个成本之和最小的原则选择最优的数据中心。在数据中心内部任务分配阶段，首先在所选数据中心内部，划分计算任务并遵循先到先服务（FCFS）原则将任务添加到调度队列中；然后结合计算设备状态和环境温度，采用基于双深度Q网络（Double DQN）的任务分配算法获得最优分配策略，以选择服务器执行计算任务，避免热点的产生，并降低制冷设备的能耗。实验结果表明，基于PER-DQN的数据中心选择算法相较于计算资源优先（CRF）、最短路径优先（SPF）路径选择方法的平均总成本分别下降了3.6%、10.0%；基于Double DQN的任务部署算法相较于较轮询调度（RR）、贪心调度（Greedy）算法的平均电源使用效率（PUE）分别下降了2.5%、1.7%。可见，所提策略能够有效降低总成本和数据中心能耗，实现多数据中心的高效运行。

关键词: 深度强化学习, 多数据中心, 任务调度, 温度感知, 电源使用效率

Abstract:

The purpose of the task scheduling strategy for multiple data centers is to allocate computing tasks to different servers in each data center to improve the resource utilization and energy efficiency. Therefore， a deep reinforcement learning-based integrated scheduling strategy for multiple data center was proposed， which is divided into two stages： data center selection and task allocation within the data centers. In the multiple data centers selection stage， the computing power resources were integrated to improve the overall resource utilization. Firstly， a Deep Q Network （DQN） with Prioritized Experience Replay （PER-DQN） was used to obtain the communication paths to each data center in the network with data centers as nodes. Then， the resource usage cost and network communication cost were calculated， and the optimal data center was selected according to the principle that the sum of the two costs is minimum. In the task allocation stage， firstly， in the selected data center the computing tasks were divided and added to the scheduling queue according to the First-Come First-Served （FCFS） principle. Then， combining the computing device status and ambient temperature， the task allocation algorithm based on Double DQN （Double DQN） was used to obtain the optimal allocation strategy， thereby selecting the server to perform the computing task， avoiding the generation of hot spots and reducing the energy consumption of refrigeration equipment. Experimental results show that the average total cost of PER-DQN-based data center selection algorithm is reduced by 3.6% and 10.0% respectively compared to those of Computing Resource First （CRF） and Shortest Path First （SPF） path selection methods. Compared to Round Robin scheduling （RR） and Greedy scheduling （Greedy） algorithms， the Double DQN-based task deployment algorithm reduces the average Power Usage Effectiveness （PUE） by 2.5% and 1.7% respectively. It can be seen that the proposed strategy can reduce the total cost and data center energy consumption effectively， and realize the efficient operation of multiple data centers.

Key words: deep reinforcement learning, multiple data centers, task scheduling, temperature-aware, Power Usage Effectiveness (PUE)

中图分类号:

TP181

方和平, 刘曙光, 冉泳屹, 钟坤华. 基于深度强化学习的多数据中心一体化调度优化[J]. 计算机应用, 2023, 43(6): 1884-1892.

Heping FANG, Shuguang LIU, Yongyi RAN, Kunhua ZHONG. Integrated scheduling optimization of multiple data centers based on deep reinforcement learning[J]. Journal of Computer Applications, 2023, 43(6): 1884-1892.

图/表 14

图1 多数据中心一体化任务调度系统架构

Fig.1 Architecture of integrated task scheduling system formultiple data centers

表1 资源费用示例

Tab.1 Resource price examples

资源	费用
每个CPU核（core）	$p i$
每GB内存	$m i$
每GB磁盘	$d i$
每Mb通信带宽	$w i$

表1 资源费用示例

Tab.1 Resource price examples

资源	费用
每个CPU核（core）	$p i$
每GB内存	$m i$
每GB磁盘	$d i$
每Mb通信带宽	$w i$

图2 智能体与环境的交互

Fig.2 Interaction between agent and environment

图3 智能体训练过程

Fig.3 Process of agent training

图4 任务调度服务器到多数据中心网络拓扑

Fig.4 Network topology of task scheduling server to multiple data centers

表2 多数据中心任务调度模型参数设置

Tab.2 Multiple data center task scheduling model parameters setting

参数	设定值
训练情节数	80 000
折扣因子	0.99
内存容量	5 000
学习率	0.001
$ε$ 初始值	1
$ε$ 最小值	0.001
$ε$ 衰减率	0.000 166 5
批量尺寸	128
目标网络更新周期	70
隐藏层数	2

表2 多数据中心任务调度模型参数设置

Tab.2 Multiple data center task scheduling model parameters setting

参数	设定值
训练情节数	80 000
折扣因子	0.99
内存容量	5 000
学习率	0.001
$ε$ 初始值	1
$ε$ 最小值	0.001
$ε$ 衰减率	0.000 166 5
批量尺寸	128
目标网络更新周期	70
隐藏层数	2

图5 两种多数据中心任务调度算法的奖励函数收敛曲线

Fig.5 Convergence curves of reward functions for two kinds of multiple data center task scheduling algorithms

表3 数据中心资源单位时间价格

Tab.3 Prices of data center resources in unit time

数据中心	每个CPU核数单位时间价格	RAM/GB	Disk/GB	带宽/Mb
1	0.116 4	1.581	0.058	0.334
2	0.112 2	1.585	0.061	0.456
3	0.106 3	1.565	0.069	0.385
4	0.110 2	1.567	0.063	0.489
5	0.119 0	1.605	0.060	0.399
6	0.101 0	1.705	0.072	0.367
7	0.125 0	1.495	0.052	0.347
8	0.105 0	1.475	0.035	0.397
9	0.111 0	1.464	0.051	0.343
10	0.122 0	1.450	0.045	0.385
11	0.117 0	1.563	0.055	0.356

图6 数据中心选择成本

Fig.6 Data center selection cost

图7 数据中心仿真平台

Fig.7 Simulation platform of data center

表4 数据中心内部任务部署模型参数设置

Tab.4 Data center internal task deployment model parameters settings

参数	设定值
训练情节数	9 000
折扣因子	0.99
内存容量	5 000
学习率	0.001
$ε$ 初始值	0.2
$ε$ 最小值	0.001
$ε$ 衰减率	0.000 019 9
批量尺寸	128
目标网络更新周期	30
隐藏层数	3

表4 数据中心内部任务部署模型参数设置

Tab.4 Data center internal task deployment model parameters settings

参数	设定值
训练情节数	9 000
折扣因子	0.99
内存容量	5 000
学习率	0.001
$ε$ 初始值	0.2
$ε$ 最小值	0.001
$ε$ 衰减率	0.000 019 9
批量尺寸	128
目标网络更新周期	30
隐藏层数	3

图8 数据中心内部任务部署算法的奖励函数收敛曲线

Fig.8 Reward function convergence curve of task deployment algorithm in data center

表5 实验结果数据

Tab.5 Experimental results data

算法	平均奖励	平均PUE	平均过载数
RR	15.03±2.84	2.36±0.015	57±16.63
Greedy	15.95±2.82	2.34±2.360	38±20.88
Double DQN	17.45±2.77	2.30±0.017	33±16.63

图9 机架平均输出温度

Fig.9 Average output temperature of racks

参考文献 28

1	LI Q R， PENG Z P， CUI D L， et al. Two-stage selection of distributed data centers based on deep reinforcement learning ［J］. Cluster Computing， 2022， 25（4）： 2699-2714. 10.1007/s10586-021-03525-8
2	JIN Y B， GAO Y， QIAN Z Z， et al. Workload-aware scheduling across geo-distributed data centers［C］// Proceedings of the 15th IEEE International Conference on Trust， Security and Privacy in Computing and Communications/ 10th IEEE International Conference on Big Data Science and Engineering/ 14th IEEE International Symposium on Parallel and Distributed Processing with Applications. Piscataway： IEEE， 2016： 1455-1462. 10.1109/trustcom.2016.0228
3	CAO Z W， ZHOU X， HU H， et al. Towards a systematic survey for carbon neutral data centers ［J］. IEEE Communications Surveys and Tutorials， 2022， 24（2）： 895-936. 10.1109/comst.2022.3161275
4	PAKBAZNIA E， GHASEMAZAR M， PEDRAM M. Temperature-aware dynamic resource provisioning in a power-optimized datacenter ［C］// Proceedings of the 2010 Design， Automation and Test in Europe Conference and Exhibition. Piscataway： IEEE， 2010： 124-129. 10.1109/date.2010.5457223
5	XU J L， PALANISAMY B. Cost-aware resource management for federated clouds using resource sharing contracts ［C］// Proceedings of the IEEE 10th International Conference on Cloud Computing. Piscataway： IEEE， 2017： 238-245. 10.1109/cloud.2017.38
6	SHEN H Y， CHEN L H. A resource usage intensity aware load balancing method for virtual machine migration in cloud datacenters［J］. IEEE Transactions on Cloud Computing， 2020， 8（1）： 17-31. 10.1109/tcc.2017.2737628
7	GE Y J， WEI G Y. GA-based task scheduler for the cloud computing systems［C］// Proceedings of the 2010 International Conference on Web Information Systems and Mining. Piscataway： IEEE， 2010： 181-186. 10.1109/wism.2010.87
8	ETTIKYALA K， LATHA Y V. Rank based efficient task scheduler for cloud computing［C］// Proceedings of the 2016 International Conference on Data Mining and Advanced Computing. Piscataway： IEEE， 2016： 343-346. 10.1109/sapience.2016.7684151
9	YI D L， ZHOU X， WEN Y G， et al. Toward efficient compute-intensive job allocation for green data centers： a deep reinforcement learning approach ［C］// Proceedings of the IEEE 39th International Conference on Distributed Computing Systems. Piscataway： IEEE， 2019： 634-644. 10.1109/icdcs.2019.00069
10	RAN Y Y， HU H， ZHOU X， et al. DeepEE： joint optimization of job scheduling and cooling control for data center energy efficiency using deep reinforcement learning ［C］// Proceedings of the IEEE 39th International Conference on Distributed Computing Systems. Piscataway： IEEE， 2019： 645-655. 10.1109/icdcs.2019.00070
11	CHENG M X， LI J， BOGDAN P， et al. H₂O-Cloud： a resource and quality of service-aware task scheduling framework for warehouse-scale data centers［J］. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems， 2020， 39（10）： 2925-2937. 10.1109/tcad.2019.2930575
12	祝家钰，肖丹，王飞. 云计算下负载均衡的多维QoS约束任务调度机制［J］. 计算机工程与应用， 2013， 49（9）：85-89. 10.3778/j.issn.1002-8331.1110-0676
	ZHU J Y， XIAO D， WANG F. Multi-dimensional QoS constrained scheduling mechanism based on load balancing for cloud computing［J］. Computer Engineering and Applications， 2013， 49（9）：85-89. 10.3778/j.issn.1002-8331.1110-0676
13	左利云，曹志波. 云计算中调度问题研究综述［J］. 计算机应用研究， 2012， 29（11）：4023-4027. 10.3969/j.issn.1001-3695.2012.11.005
	ZUO L Y， CAO Z B. Review of scheduling research in cloud computing［J］. Application Research of Computers， 2012， 29（11）： 4023-4027. 10.3969/j.issn.1001-3695.2012.11.005
14	ZHAO J， LI H X， WU C， et al. Dynamic pricing and profit maximization for the cloud with geo-distributed data centers ［C］// Proceedings of the 2014 IEEE Conference on Computer Communications. Piscataway： IEEE， 2014： 118-126. 10.1109/infocom.2014.6847931
15	ALZHOURI F， AGARWAL A， LIU Y. Maximizing cloud revenue using dynamic pricing of multiple class virtual machines ［J］. IEEE Transactions on Cloud Computing， 2021， 9（2）： 682-695. 10.1109/tcc.2018.2878023
16	RAN Y Y， YANG J， ZHANG S B， et al. Dynamic IaaS computing resource provisioning strategy with QoS constraint［J］. IEEE Transactions on Services Computing， 2017， 10（2）： 190-202. 10.1109/tsc.2015.2464212
17	GU L， ZENG D Z， GUO S， et al. A general communication cost optimization framework for big data stream processing in geo-distributed data centers［J］. IEEE Transactions on Computers， 2016， 65（1）： 19-29. 10.1109/tc.2015.2417566
18	HILDA J， CHANDRASEKARAN S. Cost and time economical planning algorithm for scientific workflows in cloud computing ［J］. Future Internet， 2021， 13（10）： No.263. 10.3390/fi13100263
19	WANG W， YAN Y， ZHANG L M， et al. Collaborative sparse coding for multiview action recognition［J］. IEEE MultiMedia， 2016， 23（4）： 80-87. 10.1109/mmul.2016.69
20	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Playing Atari with deep reinforcement learning ［EB/OL］. （2013-12-19）［2022-04-20］.. 10.1038/nature14236
21	van HASSELT H， GUEZ A， SILVER D. Deep reinforcement learning with double Q-learning［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 2094-2100. 10.1609/aaai.v30i1.10295
22	SCHAUL T， QUAN J， ANTONOGLOU I， et al. Prioritized experience replay ［EB/OL］. （2016-02-25）［2022-04-20］..
23	GÓMEZ-MARTÍN C， VEGA-RODRÍGUEZ M A， GONZÁLEZ-SÁNCHEZ J L. Performance and energy aware scheduling simulator for HPC： evaluating different resource selection methods［J］. Concurrency and Computation： Practice and Experience， 2015， 27（17）： 5436-5459. 10.1002/cpe.3607
24	LI Q R， PENG Z P， CUI D L， et al. Data center selection based on reinforcement learning ［C］// Proceedings of the 4th International Conference on Cloud Computing and Internet of Things. Piscataway： IEEE， 2019： 14-19. 10.1109/cciot48581.2019.8980333
25	The Hebrew University — Experimental Systems Lab. Logs of real parallel workloads from production systems ［EB/OL］. （2019-09-24）［2022-05-09］..
26	PATERIA S， SUBAGDJA B， TAN A H， et al. Hierarchical reinforcement learning： a comprehensive survey［J］. ACM Computing Surveys， 2021， 54（5）： No.109. 10.1145/3453160
27	BACON P L， HARB J， PRECUP D. The option-critic architecture［C］// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2017： 1726-1734. 10.1609/aaai.v31i1.10916
28	KULKARNI T D， NARASIMHAN K R， SAEEDI A， et al. Hierarchical deep reinforcement learning： integrating temporal abstraction and intrinsic motivation［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 3682-3690. 10.48550/arXiv.1604.06057

[1]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[2]	马天, 席润韬, 吕佳豪, 曾奕杰, 杨嘉怡, 张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2055-2064.
[3]	赵晓焱, 韩威, 张俊娜, 袁培燕. 基于异步深度强化学习的车联网协作卸载策略[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1501-1510.
[4]	唐睿, 庞川林, 张睿智, 刘川, 岳士博. D2D通信增强的蜂窝网络中基于DDPG的资源分配[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1562-1569.
[5]	秦鑫彤, 宋政育, 侯天为, 王飞越, 孙昕, 黎伟. 基于自适应p持续的移动自组网信道接入和资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 863-868.
[6]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 445-451.
[7]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[8]	余家宸, 杨晔. 基于裁剪近端策略优化算法的软机械臂不规则物体抓取[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3629-3638.
[9]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[10]	尚绍法, 蒋林, 李远成, 朱筠. 异构平台下卷积神经网络推理模型自适应划分和调度方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2828-2835.
[11]	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2636-2643.
[12]	王子腾, 于亚新, 夏子芳, 乔佳琪. 融合好奇心和策略蒸馏的稀疏奖励探索机制[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2082-2090.
[13]	李校林, 江雨桑. 无人机辅助移动边缘计算中的任务卸载算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1893-1899.
[14]	黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.
[15]	曹腾飞, 刘延亮, 王晓英. 基于改进深度强化学习的边缘计算服务卸载算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1543-1550.

基于深度强化学习的多数据中心一体化调度优化

Integrated scheduling optimization of multiple data centers based on deep reinforcement learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 28

相关文章 15

编辑推荐

Metrics