基于并行强化学习的云机器人任务调度策略

doi:10.11772/j.issn.1001-9081.2018061406

计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 501-508.DOI: 10.11772/j.issn.1001-9081.2018061406

基于并行强化学习的云机器人任务调度策略

沙宗轩, 薛菲, 朱杰

北京物资学院信息学院, 北京 101149

收稿日期:2018-07-06 修回日期:2018-08-16 发布日期:2019-02-15 出版日期:2019-02-10
通讯作者: 薛菲
作者简介:沙宗轩(1990-),男,安徽蚌埠人,硕士研究生,主要研究方向:强化学习、机器学习;薛菲(1985-),男,山东滨州人,讲师,博士,主要研究方向:模式识别、机器学习;朱杰(1960-),男,北京人,教授,博士,主要研究方向:智能优化、人工智能。
基金资助:
国家自然科学基金资助项目（71371033）；北京市教委科技计划面上项目（KM201810037002）；北京市智能物流系统协同创新中心资助项目（0351701301）。

Scheduling strategy of cloud robots based on parallel reinforcement learning

SHA Zongxuan, XUE Fei, ZHU Jie

School of Information, Beijing Wuzi University, Beijing 101149, China

Received:2018-07-06 Revised:2018-08-16 Online:2019-02-15 Published:2019-02-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (71371033), the Science and Technology Program of Beijing Municipal Education Commission (KM201810037002), the Beijing Intelligent Logistics System Collaborative Innovation Center Project (0351701301).

摘要/Abstract

摘要： 为了解决机器人完成大规模状态空间强化学习任务时收敛慢的问题，提出一种基于优先级的并行强化学习任务调度策略。首先，证明Q学习在异步并行计算模式下的收敛性；然后，将复杂问题根据状态空间进行分割，调度中心根据所提策略将子问题和计算节点匹配，各计算节点完成子问题的强化学习任务并向调度中心反馈结果，实现在计算机集群中的并行强化学习；最后，以CloudSim为软件基础搭建实验环境，求解最优步长、折扣率和子问题规模等参数，并通过对实际问题求解证明在不同计算节点数的情况下所提策略的性能。在使用64个计算节点的情况下所提策略相比轮询调度和随机调度的效率分别提升了61%和86%。实验结果表明，该策略在并行计算情况下有效提高了收敛速度，并进一步验证了该策略得到百万级状态空间控制问题的最优策略需要约1.6×10⁵ s。

关键词: 云机器人, 强化学习, Q学习, 并行计算, 任务调度, CloudSim

Abstract: In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×10⁵ s to get the optimal strategy for the control probelm with 1 million state space.

Key words: cloud robot, reinforcement learning, Q-Learning, parallel computing, task scheduling, CloudSim

中图分类号:

TP242.6

沙宗轩, 薛菲, 朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2): 501-508.

SHA Zongxuan, XUE Fei, ZHU Jie. Scheduling strategy of cloud robots based on parallel reinforcement learning[J]. Journal of Computer Applications, 2019, 39(2): 501-508.

参考文献

[1] 陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. (CHEN K, ZHENG W M. Cloud computing:system instances and current research[J]. Journal of Software, 2009, 20(5):1337-1348.)
[2] 林闯,苏文博,孟坤,等.云计算安全:架构、机制与模型评价[J].计算机学报,2013,36(9):1765-1784. (LIN C, SU W B, MENG K, et al. Cloud computing security:architecture,mechanism and modeling[J]. Chinese Journal of Computers, 2013, 36(9):1765-1784.)
[3] KUFFNER J J, LAVALLE S M. Space-filling trees:a new perspective on incremental search for motion planning[C]//Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ:IEEE, 2011:2199-2206.
[4] DU Z, HE L, CHEN Y, et al. Robot cloud:bridging the power of robotics and cloud computing[J]. Future Generation Computer Systems, 2017, 74:337-348.
[5] QURESHI B, KOUBÂA A. Five traits of performance enhancement using cloud robotics:asurvey[J]. Procedia Computer Science, 2014, 37:220-227.
[6] XU W, LIU Q, XU W J, et al. Energy condition perception and big data analysis for industrial cloud robotics[J]. Procedia CIRP, 2017, 61:370-375.
[7] WAN J, SHEN F. Introduction to the special section on cloud robotics for industrial applications[J]. Computers and Electrical Engineering, 2017, 63:53-55.
[8] YAN H, HUA Q, WANG Y, et al. Cloud robotics in smart manufacturing environments:challenges and countermeasures[J]. Computers and Electrical Engineering, 2017, 63:56-65.
[9] WAIBEL M, BEETZ M, CIVERA J, et al. RoboEarth-a world wide Web for robots[J]. IEEE Robotics and Automation Magazine, 2011, 18(2):69-82.
[10] WANG F Y, ZHANG J, ZHENG X H, et al. Where does AlphaGo go:from church-turing thesis to alphago thesis and beyond[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2):113-120.
[11] KEHO B, MATSYKAWA A, CANDIDO S, et al. Cloud-based robot grasping with the google object recognition engine[C]//Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2013:4263-4270.
[12] 周风余,尹磊,宋锐,等.一种机器人云平台服务构建与调度新方法[J].机器人,2017,39(1):89-98. (ZHOU F Y, YIN L, SONG R, et al, A novel building and scheduling method of cloud platform services for robot[J]. Robot, 2017, 39(1):89-98.)
[13] CARDARELLI E, DIGANI V, SABATTINI L, et al. Cooperative cloud robotics architecture for the coordination of multi-AGV systems in industrial warehouses[J]. Mechatronics, 2017, 45:1-13.
[14] JEVTIC A, COLOMÉ A, ALENYÀ G, et al. Robot motion adaptation through user intervention and reinforcement learning[J]. Pattern Recognition Letters, 2018, 105:67-75.
[15] 党小超,姚浩浩,郝占军.Q学习和蚁群优化混合的无线传感器网络移动代理路由算法[J].计算机应用,2013,33(9):2440-2443,2449. (DANG X C, YAO H H, HAO Z J. Mobile Agent routing algorithm for WSN based on Q learning hybrid with ant colony optimization[J]. Journal of Computer Applications, 2013, 33(9):2440-2443, 2449.)
[16] 王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. (WANG C, GUO J, BAO Z Q. Application of improved Q learning algorithm to job shop problem[J]. Journal of Computer Applications, 2008, 28(12):3268-3270)
[17] LEOTTAU D L, RUIZ-DEL-SOLAR J, BABUŠKA R. Decentralized reinforcement learning of robot behaviors[J]. Artificial Intelligence, 2018, 256:130-159.
[18] DRUGAN M, WIERING M, VAMPLEW P, et al. Special issue on multi-objective reinforcement learning[J]. Neurocomputing, 2017, 263:1-2.
[19] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292.
[20] SHAH S M, BORKAR V S. Q-learning for Markov decision processes with a satisfiability criterion[J]. Systems & Control Letters, 2018, 113:45-51.
[21] POURPANAH F, TAN C J, LIM C P, et al. A Q-learning-based multi-agent system for data classification[J]. Applied Soft Computing, 2017, 52:519-531.
[22] KHIM S, HONG S, KIM Y, et al. Adaptive visual tracking using the prioritized Q-learning algorithm:MDP-based parameter learning approach[J]. Image & Vision Computing, 2014, 32(12):1090-1101.
[23] TSITSIKLIS J N. Asynchronous stochastic approximation and Q-Learning[J]. Machine Learning, 1994, 16(3):185-202.
[24] WU R, DOWN D G. Round robin scheduling of heterogeneous parallel servers in heavy traffic[J]. European Journal of Operational Research, 2009, 195(2):372-380.
[25] SOUALHIA M, KHOMH F, TAHAR S. Task scheduling in big data platforms:a systematic literature review[J]. The Journal of Systems & Software, 2017, 134:170-189.
[26] MAMOUN M B, FOURNEAU J-M, PEKERGIN N. Analyzing weighted round robin policies with a stochastic comparison approach[J]. Computers and Operations Research, 2008, 35(8):2420-2431.
[27] SUKSOMPONG W. Scheduling asynchronous round-robin tournaments[J]. Operations Research Letters, 2016, 44(1):96-100
[28] GOYAL T, SINGH A, AGRAWAL A. CloudSim:simulator for cloud computing infrastructure and modeling[J]. Procedia Engineering, 2012, 38:3566-3572.
[29] HE Z T, ZHANG X Q, ZHANG H X, et al. Study on new task scheduling strategy in cloud computing environment based on the simulator CloudSim[J]. Advanced Materials Research, 2013, 2249(651):829-834.
[30] MEHMI S, VERMA H K, SANGAL A L. Simulation modeling of cloud computing for smart grid using CloudSim[J]. Journal of Electrical Systems and Information Technology, 2016, 4(1):159-172.
[31] CHOWDHURY M R, MAHMUD M R, RAHMAN R M. Implementation and performance analysis of various VM placement strategies in CloudSim[J]. Journal of Cloud Computing:Advances, Systems and Applications, 2015, 4(1):Article No. 45.

基于并行强化学习的云机器人任务调度策略

Scheduling strategy of cloud robots based on parallel reinforcement learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	肖海林, 黄天义, 代秋香, 张跃军, 张中山. 基于轨迹预测的安全强化学习自动变道决策方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2958-2963.
[2]	张润莲, 张密, 武小年, 舒瑞. 基于GPU的大状态密码S盒差分性质评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2785-2790.
[3]	何浩东, 符浩, 王强, 周帅, 刘伟. 基于深度强化学习的多机器人路径跟随与编队[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2626-2633.
[4]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[5]	马天, 席润韬, 吕佳豪, 曾奕杰, 杨嘉怡, 张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2055-2064.
[6]	赵晓焱, 韩威, 张俊娜, 袁培燕. 基于异步深度强化学习的车联网协作卸载策略[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1501-1510.
[7]	唐睿, 庞川林, 张睿智, 刘川, 岳士博. D2D通信增强的蜂窝网络中基于DDPG的资源分配[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1562-1569.
[8]	陈发堂, 黄淼, 金宇峰. 面向用户需求的低轨卫星资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1242-1247.
[9]	秦鑫彤, 宋政育, 侯天为, 王飞越, 孙昕, 黎伟. 基于自适应p持续的移动自组网信道接入和资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 863-868.
[10]	宋紫阳, 李军怀, 王怀军, 苏鑫, 于蕾. 基于路径模仿和SAC强化学习的机械臂路径规划算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 439-444.
[11]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[12]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 445-451.
[13]	余家宸, 杨晔. 基于裁剪近端策略优化算法的软机械臂不规则物体抓取[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3629-3638.
[14]	王昱, 关智慧, 李远鹏. 基于轨迹预测和分布式MADDPG的无人机集群追击决策[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3623-3628.
[15]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.