[1] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44. [2] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292. [3] 沙宗轩, 薛菲, 朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2):501-508. (SHA Z X, XUE F, ZHU J. Scheduling strategy of cloud robots based on parallel reinforcement learning[J]. Journal of Computer Applications, 2019, 39(2):501-508.) [4] SHAKEEL P M, BASKAR S, DHULIPALA V R S, et al. Maintaining security and privacy in health care system using learning based deep-Q-networks[J]. Journal of Medical Systems, 2018, 42(10):186-186. [5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. [6] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489. [7] 赵玉婷, 韩宝玲, 罗庆生. 基于deep Q-network双足机器人非平整地面行走稳定性控制方法[J]. 计算机应用, 2018, 38(9):2459-2463. (ZHAO Y T, HAN B L, LUO Q S. Walking stability control method based on deep Q-network for biped robot on uneven ground[J]. Journal of Computer Applications, 2018, 38(9):2459-2463.) [8] ALANSARY A, OKTAY O, LI Y W, et al. Evaluating reinforcement learning agents for anatomical landmark detection[J]. Medical Image Analysis, 2019, 53:156-164. [9] ZHU J, ZHU J, WANG Z, et al. Hierarchical decision and control for continuous multitarget problem:policy evaluation with action delay[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2):464-473. [10] LIN L J. Self-improving reactive Agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992, 8(3/4):293-321. [11] WULFING J, KUMAR S S, BOEDECKER J, et al. Adaptive long-term control of biological neural networks with deep reinforcement learning[J]. Neurocomputing, 2019, 342:66-74. [12] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9:1735-1780. [13] KIM J J, CHA S H, CHO K H, et al. Deep reinforcement learning based multi-Agent collaborated network for distributed stock trading[J]. International Journal of Grid and Distributed Computing, 2018, 11(2):11-20. [14] 朱斐, 吴文, 刘全, 等. 一种最大置信上界经验采样的深度Q网络方法[J]. 计算机研究与发展, 2018, 55(8):1694-1705.(ZHU F, WU W, LIU Q, et al. A deep Q-network method based on upper confidence bound experience sampling[J]. Journal of Computer Research and Development, 2018, 55(8):1694-1705.) [15] BRUIN T D, KOBER J, TUYLS K, et al. Experience selection in deep reinforcement learning for control[J]. Journal of Machine Learning Research, 2018, 19:1-56. [16] YOU S X, DIAO M, GAO L P. Deep reinforcement learning for target searching in cognitive electronic warfare[J]. IEEE Access, 2019, 7:37432-37447. [17] LEI X Y, ZHANG Z A, DONG P F. Dynamic path planning of unknown environment based on deep reinforcement learning[J]. Journal of Robotics, 2018, 2018:Article ID 5781591. |