Abstract:In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×105 s to get the optimal strategy for the control probelm with 1 million state space.
[1] 陈康,郑纬民.云计算:系统实例与研究现状[J].软件学报,2009,20(5):1337-1348. (CHEN K, ZHENG W M. Cloud computing:system instances and current research[J]. Journal of Software, 2009, 20(5):1337-1348.) [2] 林闯,苏文博,孟坤,等.云计算安全:架构、机制与模型评价[J].计算机学报,2013,36(9):1765-1784. (LIN C, SU W B, MENG K, et al. Cloud computing security:architecture,mechanism and modeling[J]. Chinese Journal of Computers, 2013, 36(9):1765-1784.) [3] KUFFNER J J, LAVALLE S M. Space-filling trees:a new perspective on incremental search for motion planning[C]//Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway, NJ:IEEE, 2011:2199-2206. [4] DU Z, HE L, CHEN Y, et al. Robot cloud:bridging the power of robotics and cloud computing[J]. Future Generation Computer Systems, 2017, 74:337-348. [5] QURESHI B, KOUBÂA A. Five traits of performance enhancement using cloud robotics:asurvey[J]. Procedia Computer Science, 2014, 37:220-227. [6] XU W, LIU Q, XU W J, et al. Energy condition perception and big data analysis for industrial cloud robotics[J]. Procedia CIRP, 2017, 61:370-375. [7] WAN J, SHEN F. Introduction to the special section on cloud robotics for industrial applications[J]. Computers and Electrical Engineering, 2017, 63:53-55. [8] YAN H, HUA Q, WANG Y, et al. Cloud robotics in smart manufacturing environments:challenges and countermeasures[J]. Computers and Electrical Engineering, 2017, 63:56-65. [9] WAIBEL M, BEETZ M, CIVERA J, et al. RoboEarth-a world wide Web for robots[J]. IEEE Robotics and Automation Magazine, 2011, 18(2):69-82. [10] WANG F Y, ZHANG J, ZHENG X H, et al. Where does AlphaGo go:from church-turing thesis to alphago thesis and beyond[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2):113-120. [11] KEHO B, MATSYKAWA A, CANDIDO S, et al. Cloud-based robot grasping with the google object recognition engine[C]//Proceedings of the 2013 IEEE International Conference on Robotics and Automation. Piscataway, NJ:IEEE, 2013:4263-4270. [12] 周风余,尹磊,宋锐,等.一种机器人云平台服务构建与调度新方法[J].机器人,2017,39(1):89-98. (ZHOU F Y, YIN L, SONG R, et al, A novel building and scheduling method of cloud platform services for robot[J]. Robot, 2017, 39(1):89-98.) [13] CARDARELLI E, DIGANI V, SABATTINI L, et al. Cooperative cloud robotics architecture for the coordination of multi-AGV systems in industrial warehouses[J]. Mechatronics, 2017, 45:1-13. [14] JEVTIC A, COLOMÉ A, ALENYÀ G, et al. Robot motion adaptation through user intervention and reinforcement learning[J]. Pattern Recognition Letters, 2018, 105:67-75. [15] 党小超,姚浩浩,郝占军.Q学习和蚁群优化混合的无线传感器网络移动代理路由算法[J].计算机应用,2013,33(9):2440-2443,2449. (DANG X C, YAO H H, HAO Z J. Mobile Agent routing algorithm for WSN based on Q learning hybrid with ant colony optimization[J]. Journal of Computer Applications, 2013, 33(9):2440-2443, 2449.) [16] 王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. (WANG C, GUO J, BAO Z Q. Application of improved Q learning algorithm to job shop problem[J]. Journal of Computer Applications, 2008, 28(12):3268-3270) [17] LEOTTAU D L, RUIZ-DEL-SOLAR J, BABUŠKA R. Decentralized reinforcement learning of robot behaviors[J]. Artificial Intelligence, 2018, 256:130-159. [18] DRUGAN M, WIERING M, VAMPLEW P, et al. Special issue on multi-objective reinforcement learning[J]. Neurocomputing, 2017, 263:1-2. [19] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292. [20] SHAH S M, BORKAR V S. Q-learning for Markov decision processes with a satisfiability criterion[J]. Systems & Control Letters, 2018, 113:45-51. [21] POURPANAH F, TAN C J, LIM C P, et al. A Q-learning-based multi-agent system for data classification[J]. Applied Soft Computing, 2017, 52:519-531. [22] KHIM S, HONG S, KIM Y, et al. Adaptive visual tracking using the prioritized Q-learning algorithm:MDP-based parameter learning approach[J]. Image & Vision Computing, 2014, 32(12):1090-1101. [23] TSITSIKLIS J N. Asynchronous stochastic approximation and Q-Learning[J]. Machine Learning, 1994, 16(3):185-202. [24] WU R, DOWN D G. Round robin scheduling of heterogeneous parallel servers in heavy traffic[J]. European Journal of Operational Research, 2009, 195(2):372-380. [25] SOUALHIA M, KHOMH F, TAHAR S. Task scheduling in big data platforms:a systematic literature review[J]. The Journal of Systems & Software, 2017, 134:170-189. [26] MAMOUN M B, FOURNEAU J-M, PEKERGIN N. Analyzing weighted round robin policies with a stochastic comparison approach[J]. Computers and Operations Research, 2008, 35(8):2420-2431. [27] SUKSOMPONG W. Scheduling asynchronous round-robin tournaments[J]. Operations Research Letters, 2016, 44(1):96-100 [28] GOYAL T, SINGH A, AGRAWAL A. CloudSim:simulator for cloud computing infrastructure and modeling[J]. Procedia Engineering, 2012, 38:3566-3572. [29] HE Z T, ZHANG X Q, ZHANG H X, et al. Study on new task scheduling strategy in cloud computing environment based on the simulator CloudSim[J]. Advanced Materials Research, 2013, 2249(651):829-834. [30] MEHMI S, VERMA H K, SANGAL A L. Simulation modeling of cloud computing for smart grid using CloudSim[J]. Journal of Electrical Systems and Information Technology, 2016, 4(1):159-172. [31] CHOWDHURY M R, MAHMUD M R, RAHMAN R M. Implementation and performance analysis of various VM placement strategies in CloudSim[J]. Journal of Cloud Computing:Advances, Systems and Applications, 2015, 4(1):Article No. 45.