《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (6): 1884-1892.DOI: 10.11772/j.issn.1001-9081.2022050722

• 先进计算 • 上一篇    下一篇

基于深度强化学习的多数据中心一体化调度优化

方和平1,2, 刘曙光1(), 冉泳屹3, 钟坤华1   

  1. 1.中国科学院 重庆绿色智能技术研究院, 重庆 400714
    2.中国科学院大学 重庆学院, 重庆 400714
    3.重庆邮电大学 通信与信息工程学院, 重庆 400065
  • 收稿日期:2022-05-20 修回日期:2022-08-04 接受日期:2022-08-08 发布日期:2023-06-08 出版日期:2023-06-10
  • 通讯作者: 刘曙光
  • 作者简介:方和平(1997—),男,重庆人,硕士研究生,CCF会员,主要研究方向:深度强化学习、绿色数据中心
    刘曙光(1971—),男,山西晋城人,高级工程师,硕士,主要研究方向:人工智能、大数据Email:liushuguang@cigit.ac.cn
    冉泳屹(1986—),男,四川巴中人,讲师,博士,主要研究方向:深度强化学习、通信系统
    钟坤华(1984—), 男,重庆人,博士研究生,主要研究方向:贝叶斯网络、人工智能。
  • 基金资助:
    中国科学院科技服务网络计划项目(KFJ-STS-QYZD-2021-01-001)

Integrated scheduling optimization of multiple data centers based on deep reinforcement learning

Heping FANG1,2, Shuguang LIU1(), Yongyi RAN3, Kunhua ZHONG1   

  1. 1.Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences,Chongqing 400714,China
    2.Chongqing School,University of Chinese Academy of Sciences,Chongqing 400714,China
    3.School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China
  • Received:2022-05-20 Revised:2022-08-04 Accepted:2022-08-08 Online:2023-06-08 Published:2023-06-10
  • Contact: Shuguang LIU
  • About author:FANG Heping, born in 1997, M. S. candidate. His research interests include deep reinforcement learning, green data center.
    RAN Yongyi, born in 1986, Ph. D., lecturer. His research interests include deep reinforcement learning, communication system.
    ZHONG Kunhua, born in 1984, Ph. D. candidate. His research interests include Bayesian network, artificial intelligence.
  • Supported by:
    Science and Technology Service Network Program of Chinese Academy of Sciences(KFJ?STS?QYZD?2021?01?001)

摘要:

多数据中心任务调度策略的目的是把计算任务分配到各个数据中心的不同服务器上,以促进资源利用率和能效的提升,为此提出了基于深度强化学习的多数据中心一体化调度策略。所提策略分为数据中心选择和数据中心内部任务分配两个阶段。在多数据中心选择阶段,整合算力资源以提高总体资源利用率,首先采用具有优先经验回放的深度Q网络(PER-DQN)在以数据中心为节点的网络中获取到达各个数据中心的通信路径;然后计算资源使用成本和网络通信成本,并依据这两个成本之和最小的原则选择最优的数据中心。在数据中心内部任务分配阶段,首先在所选数据中心内部,划分计算任务并遵循先到先服务(FCFS)原则将任务添加到调度队列中;然后结合计算设备状态和环境温度,采用基于双深度Q网络(Double DQN)的任务分配算法获得最优分配策略,以选择服务器执行计算任务,避免热点的产生,并降低制冷设备的能耗。实验结果表明,基于PER-DQN的数据中心选择算法相较于计算资源优先(CRF)、最短路径优先(SPF)路径选择方法的平均总成本分别下降了3.6%、10.0%;基于Double DQN的任务部署算法相较于较轮询调度(RR)、贪心调度(Greedy)算法的平均电源使用效率(PUE)分别下降了2.5%、1.7%。可见,所提策略能够有效降低总成本和数据中心能耗,实现多数据中心的高效运行。

关键词: 深度强化学习, 多数据中心, 任务调度, 温度感知, 电源使用效率

Abstract:

The purpose of the task scheduling strategy for multiple data centers is to allocate computing tasks to different servers in each data center to improve the resource utilization and energy efficiency. Therefore, a deep reinforcement learning-based integrated scheduling strategy for multiple data center was proposed, which is divided into two stages: data center selection and task allocation within the data centers. In the multiple data centers selection stage, the computing power resources were integrated to improve the overall resource utilization. Firstly, a Deep Q Network (DQN) with Prioritized Experience Replay (PER-DQN) was used to obtain the communication paths to each data center in the network with data centers as nodes. Then, the resource usage cost and network communication cost were calculated, and the optimal data center was selected according to the principle that the sum of the two costs is minimum. In the task allocation stage, firstly, in the selected data center the computing tasks were divided and added to the scheduling queue according to the First-Come First-Served (FCFS) principle. Then, combining the computing device status and ambient temperature, the task allocation algorithm based on Double DQN (Double DQN) was used to obtain the optimal allocation strategy, thereby selecting the server to perform the computing task, avoiding the generation of hot spots and reducing the energy consumption of refrigeration equipment. Experimental results show that the average total cost of PER-DQN-based data center selection algorithm is reduced by 3.6% and 10.0% respectively compared to those of Computing Resource First (CRF) and Shortest Path First (SPF) path selection methods. Compared to Round Robin scheduling (RR) and Greedy scheduling (Greedy) algorithms, the Double DQN-based task deployment algorithm reduces the average Power Usage Effectiveness (PUE) by 2.5% and 1.7% respectively. It can be seen that the proposed strategy can reduce the total cost and data center energy consumption effectively, and realize the efficient operation of multiple data centers.

Key words: deep reinforcement learning, multiple data centers, task scheduling, temperature-aware, Power Usage Effectiveness (PUE)

中图分类号: