基于并行强化学习的云机器人任务调度策略

doi:10.11772/j.issn.1001-9081.2018061406

计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 501-508.DOI: 10.11772/j.issn.1001-9081.2018061406

基于并行强化学习的云机器人任务调度策略

沙宗轩, 薛菲, 朱杰

北京物资学院信息学院, 北京 101149

收稿日期:2018-07-06 修回日期:2018-08-16 出版日期:2019-02-10 发布日期:2019-02-15
通讯作者: 薛菲
作者简介:沙宗轩(1990-),男,安徽蚌埠人,硕士研究生,主要研究方向:强化学习、机器学习;薛菲(1985-),男,山东滨州人,讲师,博士,主要研究方向:模式识别、机器学习;朱杰(1960-),男,北京人,教授,博士,主要研究方向:智能优化、人工智能。
基金资助:
国家自然科学基金资助项目（71371033）；北京市教委科技计划面上项目（KM201810037002）；北京市智能物流系统协同创新中心资助项目（0351701301）。

Scheduling strategy of cloud robots based on parallel reinforcement learning

SHA Zongxuan, XUE Fei, ZHU Jie

School of Information, Beijing Wuzi University, Beijing 101149, China

Received:2018-07-06 Revised:2018-08-16 Online:2019-02-10 Published:2019-02-15
Supported by:
This work is partially supported by the National Natural Science Foundation of China (71371033), the Science and Technology Program of Beijing Municipal Education Commission (KM201810037002), the Beijing Intelligent Logistics System Collaborative Innovation Center Project (0351701301).

摘要/Abstract

摘要： 为了解决机器人完成大规模状态空间强化学习任务时收敛慢的问题，提出一种基于优先级的并行强化学习任务调度策略。首先，证明Q学习在异步并行计算模式下的收敛性；然后，将复杂问题根据状态空间进行分割，调度中心根据所提策略将子问题和计算节点匹配，各计算节点完成子问题的强化学习任务并向调度中心反馈结果，实现在计算机集群中的并行强化学习；最后，以CloudSim为软件基础搭建实验环境，求解最优步长、折扣率和子问题规模等参数，并通过对实际问题求解证明在不同计算节点数的情况下所提策略的性能。在使用64个计算节点的情况下所提策略相比轮询调度和随机调度的效率分别提升了61%和86%。实验结果表明，该策略在并行计算情况下有效提高了收敛速度，并进一步验证了该策略得到百万级状态空间控制问题的最优策略需要约1.6×10⁵ s。

关键词: 云机器人, 强化学习, Q学习, 并行计算, 任务调度, CloudSim

Abstract: In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×10⁵ s to get the optimal strategy for the control probelm with 1 million state space.

Key words: cloud robot, reinforcement learning, Q-Learning, parallel computing, task scheduling, CloudSim

中图分类号:

TP242.6

沙宗轩, 薛菲, 朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2): 501-508.

SHA Zongxuan, XUE Fei, ZHU Jie. Scheduling strategy of cloud robots based on parallel reinforcement learning[J]. Journal of Computer Applications, 2019, 39(2): 501-508.

参考文献

[1] 陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. (CHEN K, ZHENG W M. Cloud computing:system instances and current research[J]. Journal of Software, 2009, 20(5):1337-1348.)
[2] 林闯,苏文博,孟坤,等.云计算安全:架构、机制与模型评价[J].计算机学报,2013,36(9):1765-1784. (LIN C, SU W B, MENG K, et al. Cloud computing security:architecture,mechanism and modeling[J]. Chinese Journal of Computers, 2013, 36(9):1765-1784.)
[3] KUFFNER J J, LAVALLE S M. Space-filling trees:a new perspective on incremental search for motion planning[C]//Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ:IEEE, 2011:2199-2206.
[4] DU Z, HE L, CHEN Y, et al. Robot cloud:bridging the power of robotics and cloud computing[J]. Future Generation Computer Systems, 2017, 74:337-348.
[5] QURESHI B, KOUBÂA A. Five traits of performance enhancement using cloud robotics:asurvey[J]. Procedia Computer Science, 2014, 37:220-227.
[6] XU W, LIU Q, XU W J, et al. Energy condition perception and big data analysis for industrial cloud robotics[J]. Procedia CIRP, 2017, 61:370-375.
[7] WAN J, SHEN F. Introduction to the special section on cloud robotics for industrial applications[J]. Computers and Electrical Engineering, 2017, 63:53-55.
[8] YAN H, HUA Q, WANG Y, et al. Cloud robotics in smart manufacturing environments:challenges and countermeasures[J]. Computers and Electrical Engineering, 2017, 63:56-65.
[9] WAIBEL M, BEETZ M, CIVERA J, et al. RoboEarth-a world wide Web for robots[J]. IEEE Robotics and Automation Magazine, 2011, 18(2):69-82.
[10] WANG F Y, ZHANG J, ZHENG X H, et al. Where does AlphaGo go:from church-turing thesis to alphago thesis and beyond[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2):113-120.
[11] KEHO B, MATSYKAWA A, CANDIDO S, et al. Cloud-based robot grasping with the google object recognition engine[C]//Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2013:4263-4270.
[12] 周风余,尹磊,宋锐,等.一种机器人云平台服务构建与调度新方法[J].机器人,2017,39(1):89-98. (ZHOU F Y, YIN L, SONG R, et al, A novel building and scheduling method of cloud platform services for robot[J]. Robot, 2017, 39(1):89-98.)
[13] CARDARELLI E, DIGANI V, SABATTINI L, et al. Cooperative cloud robotics architecture for the coordination of multi-AGV systems in industrial warehouses[J]. Mechatronics, 2017, 45:1-13.
[14] JEVTIC A, COLOMÉ A, ALENYÀ G, et al. Robot motion adaptation through user intervention and reinforcement learning[J]. Pattern Recognition Letters, 2018, 105:67-75.
[15] 党小超,姚浩浩,郝占军.Q学习和蚁群优化混合的无线传感器网络移动代理路由算法[J].计算机应用,2013,33(9):2440-2443,2449. (DANG X C, YAO H H, HAO Z J. Mobile Agent routing algorithm for WSN based on Q learning hybrid with ant colony optimization[J]. Journal of Computer Applications, 2013, 33(9):2440-2443, 2449.)
[16] 王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. (WANG C, GUO J, BAO Z Q. Application of improved Q learning algorithm to job shop problem[J]. Journal of Computer Applications, 2008, 28(12):3268-3270)
[17] LEOTTAU D L, RUIZ-DEL-SOLAR J, BABUŠKA R. Decentralized reinforcement learning of robot behaviors[J]. Artificial Intelligence, 2018, 256:130-159.
[18] DRUGAN M, WIERING M, VAMPLEW P, et al. Special issue on multi-objective reinforcement learning[J]. Neurocomputing, 2017, 263:1-2.
[19] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292.
[20] SHAH S M, BORKAR V S. Q-learning for Markov decision processes with a satisfiability criterion[J]. Systems & Control Letters, 2018, 113:45-51.
[21] POURPANAH F, TAN C J, LIM C P, et al. A Q-learning-based multi-agent system for data classification[J]. Applied Soft Computing, 2017, 52:519-531.
[22] KHIM S, HONG S, KIM Y, et al. Adaptive visual tracking using the prioritized Q-learning algorithm:MDP-based parameter learning approach[J]. Image & Vision Computing, 2014, 32(12):1090-1101.
[23] TSITSIKLIS J N. Asynchronous stochastic approximation and Q-Learning[J]. Machine Learning, 1994, 16(3):185-202.
[24] WU R, DOWN D G. Round robin scheduling of heterogeneous parallel servers in heavy traffic[J]. European Journal of Operational Research, 2009, 195(2):372-380.
[25] SOUALHIA M, KHOMH F, TAHAR S. Task scheduling in big data platforms:a systematic literature review[J]. The Journal of Systems & Software, 2017, 134:170-189.
[26] MAMOUN M B, FOURNEAU J-M, PEKERGIN N. Analyzing weighted round robin policies with a stochastic comparison approach[J]. Computers and Operations Research, 2008, 35(8):2420-2431.
[27] SUKSOMPONG W. Scheduling asynchronous round-robin tournaments[J]. Operations Research Letters, 2016, 44(1):96-100
[28] GOYAL T, SINGH A, AGRAWAL A. CloudSim:simulator for cloud computing infrastructure and modeling[J]. Procedia Engineering, 2012, 38:3566-3572.
[29] HE Z T, ZHANG X Q, ZHANG H X, et al. Study on new task scheduling strategy in cloud computing environment based on the simulator CloudSim[J]. Advanced Materials Research, 2013, 2249(651):829-834.
[30] MEHMI S, VERMA H K, SANGAL A L. Simulation modeling of cloud computing for smart grid using CloudSim[J]. Journal of Electrical Systems and Information Technology, 2016, 4(1):159-172.
[31] CHOWDHURY M R, MAHMUD M R, RAHMAN R M. Implementation and performance analysis of various VM placement strategies in CloudSim[J]. Journal of Cloud Computing:Advances, Systems and Applications, 2015, 4(1):Article No. 45.

基于并行强化学习的云机器人任务调度策略

Scheduling strategy of cloud robots based on parallel reinforcement learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王建平, 王刚, 毛晓彬, 马恩琪. 基于深度强化学习的二连杆机械臂运动控制方法[J]. 计算机应用, 2021, 41(6): 1799-1804.
[2]	王宇, 刘燕丽, 陈劭武. 基于顶点冲突学习的最大公共子图算法[J]. 计算机应用, 2021, 41(6): 1756-1760.
[3]	解文博, 韦永壮, 刘争红. 基于CUDA的SKINNY加密算法并行实现与分析[J]. 计算机应用, 2021, 41(4): 1136-1141.
[4]	杜嘻嘻, 程华, 房一泉. 基于优势演员-评论家算法的强化自动摘要模型[J]. 计算机应用, 2021, 41(3): 699-705.
[5]	杨先凤, 贵红军, 傅春常. 统一计算设备架构下的F-X域预测滤波并行算法[J]. 计算机应用, 2021, 41(2): 486-491.
[6]	平凡, 汤小春, 潘彦宇, 李战怀. 不规则任务在图形处理器集群上的调度策略[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3295-3301.
[7]	刘思嘉, 童向荣. 基于强化学习的城市交通路径规划[J]. 计算机应用, 2021, 41(1): 185-190.
[8]	姚兴虎, 谭晓阳. 基于奖励高速路网络的多智能体强化学习中的全局信用分配算法[J]. 计算机应用, 2021, 41(1): 1-7.
[9]	傅魁, 梁少晴, 李冰. 基于改进的深度Q网络结构的商品推荐模型[J]. 计算机应用, 2020, 40(9): 2613-2621.
[10]	胡学敏, 成煜, 陈国文, 张若晗, 童秀迟. 基于深度时空Q网络的定向导航自动驾驶运动规划[J]. 计算机应用, 2020, 40(7): 1919-1925.
[11]	郑延斌, 樊文鑫, 韩梦云, 陶雪丽. 基于博弈论及Q学习的多Agent协作追捕算法[J]. 计算机应用, 2020, 40(6): 1613-1620.
[12]	刘智翔, 刘慧超, 黄冬梅, 周丽萍, 苏诚. 多种任务调度混合的IB-LBM并行优化方法[J]. 计算机应用, 2020, 40(2): 386-391.
[13]	曾志阳, 陈燕, 王珂. 圆片下料并行遗传算法的设计与实现[J]. 计算机应用, 2020, 40(2): 392-397.
[14]	李孜颖, 石振国. 面向大数据任务的调度方法[J]. 计算机应用, 2020, 40(10): 2923-2928.
[15]	任娜, 张楠, 崔妍, 张融雪, 庞新富. 面向无人机电力巡检的语义实体构建及航迹控制方法[J]. 计算机应用, 2020, 40(10): 3095-3100.