计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 501-508.DOI: 10.11772/j.issn.1001-9081.2018061406

• 先进计算 • 上一篇    下一篇

基于并行强化学习的云机器人任务调度策略

沙宗轩, 薛菲, 朱杰   

  1. 北京物资学院 信息学院, 北京 101149
  • 收稿日期:2018-07-06 修回日期:2018-08-16 出版日期:2019-02-10 发布日期:2019-02-15
  • 通讯作者: 薛菲
  • 作者简介:沙宗轩(1990-),男,安徽蚌埠人,硕士研究生,主要研究方向:强化学习、机器学习;薛菲(1985-),男,山东滨州人,讲师,博士,主要研究方向:模式识别、机器学习;朱杰(1960-),男,北京人,教授,博士,主要研究方向:智能优化、人工智能。
  • 基金资助:
    国家自然科学基金资助项目(71371033);北京市教委科技计划面上项目(KM201810037002);北京市智能物流系统协同创新中心资助项目(0351701301)。

Scheduling strategy of cloud robots based on parallel reinforcement learning

SHA Zongxuan, XUE Fei, ZHU Jie   

  1. School of Information, Beijing Wuzi University, Beijing 101149, China
  • Received:2018-07-06 Revised:2018-08-16 Online:2019-02-10 Published:2019-02-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71371033), the Science and Technology Program of Beijing Municipal Education Commission (KM201810037002), the Beijing Intelligent Logistics System Collaborative Innovation Center Project (0351701301).

摘要: 为了解决机器人完成大规模状态空间强化学习任务时收敛慢的问题,提出一种基于优先级的并行强化学习任务调度策略。首先,证明Q学习在异步并行计算模式下的收敛性;然后,将复杂问题根据状态空间进行分割,调度中心根据所提策略将子问题和计算节点匹配,各计算节点完成子问题的强化学习任务并向调度中心反馈结果,实现在计算机集群中的并行强化学习;最后,以CloudSim为软件基础搭建实验环境,求解最优步长、折扣率和子问题规模等参数,并通过对实际问题求解证明在不同计算节点数的情况下所提策略的性能。在使用64个计算节点的情况下所提策略相比轮询调度和随机调度的效率分别提升了61%和86%。实验结果表明,该策略在并行计算情况下有效提高了收敛速度,并进一步验证了该策略得到百万级状态空间控制问题的最优策略需要约1.6×105 s。

关键词: 云机器人, 强化学习, Q学习, 并行计算, 任务调度, CloudSim

Abstract: In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×105 s to get the optimal strategy for the control probelm with 1 million state space.

Key words: cloud robot, reinforcement learning, Q-Learning, parallel computing, task scheduling, CloudSim

中图分类号: