[1] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning:a survey[J]. Journal of Artificial Intelligence Research, 1996, 4(1):237-285. [2] RIEDMILLER M. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method[C]//ECML 2005:Proceedings of the 2005 European Conference on Machine Learning, LNCS 3720. Berlin:Springer, 2005:317-328. [3] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//IJCNN 2010:Proceedings of the 2010 International Joint Conference on Neural Networks. Piscataway, NJ:IEEE, 2010:1-8. [4] ABTAHI F, FASEL I. Deep belief nets as function approximators for reinforcement learning[J]. Frontiers in Computational Neuroscience, 2011, 5(1):112-131. [5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. [6] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C/OL]//ICML 2014:Proceedings of the 31st International Conference on Machine Learning.[S.l.]:JMLR, 2014, 32:387-395[2017-12-02]. http://proceedings.mlr.press/v32/silver14.pdf. [7] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv:1509.02971v5(2016-02-29)[2017-09-09]. https://arxiv.org/abs/1509.02971. [8] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C/OL]// ICML 2016:Proceedings of the 33rd International Conference on Machine Learning.[S.l.]:JMLR, 2016:1928-1937, arXiv:1602.01783v2(2016-06-16)[2017-12-29]. https://arxiv.org/abs/1602.01783. [9] 李彦冬,郝宗波,雷航.卷积神经网络研究综述[J].计算机应用,2016,36(9):2508-2515. (LI Y D, HAO Z B, LEI H. Convolution neural network research review[J]. Journal of Computer Applications, 2016, 36(9):2508-2515.) [10] HUANG G, LIU Z, van der MAATEN L, et al. Densely connected convolutional networks[C]//CVPR 2017:Proceedings of the 2017 IEEE Conference on Computer vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2017:2261-2269. [11] SUTTON R S, BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks, 2005, 16(1):285-286. [12] 莫建文,林士敏,张顺岚.基于TD强化学习智能博弈程序的设计与实现[J].计算机应用,2004,24(S1):287-288. (MO J W, LIN S M, ZHANG S L. Design and realization of intelligent game theory based on TD intensive learning[J]. Journal of Computer Applications, 2004, 24(S1):287-288.) [13] 王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. (WANG C, GUO J, BAO Z Q. Application of improved Q learning algorithm in job shop scheduling[J]. Journal of Computer Applications, 2008, 28(12):3268-3270.) [14] THEODOROU E, BUCHLI J, SCHAAL S. A generalized path integral control approach to reinforcement learning[J]. Journal of Machine Learning Research, 2010, 11:3137-3181. [15] ZHANG Q, LIN M, YANG L T, et al. Energy-efficient scheduling for real-time systems based on deep Q-learning model[J]. IEEE Transactions on Sustainable Computing, 2017:1-1. [16] WATKINS C J C H. Learning from delayed rewards[J]. Robotics & Autonomous Systems, 1989, 15(4):233-235. [17] YAROTSKY D. Error bounds for approximations with deep ReLU networks[J]. Neural Networks, 2017, 94:103-114. [18] DURYEA E, GANGER M, HU W. Exploring deep reinforcement learning with multi Q-learning[J]. Intelligent Control and Automation, 2016, 7(4):Article ID 72002. [19] 李晨溪,曹雷,张永亮,等.基于知识的深度强化学习研究综述[J].系统工程与电子技术,2017,39(11):2603-2613. (LI C X, CAO L, ZHANG Y L, et al. Overview of knowledge-based research on intensive learning[J]. Systems Engineering and Electronics, 2017, 39(11):2603-2613.) [20] SALLAB A E, ABDOU M, PEROT E, et al. Deep reinforcement learning framework for autonomous driving[J]. Electronic Imaging, 2017, 2017(19):70-76. [21] FENG Y, ZHANG H, HAO W, et al. Joint extraction of entities and relations using reinforcement learning and deep learning[J]. Computational Intelligence and Neuroscience, 2017, 2017:7643065. [22] ADAM S, BUSONIU L, BABUSKA R. Experience replay for real-time reinforcement learning control[J]. IEEE Transactions on Systems, Man & Cybernetics Part C, 2012, 42(2):201-212. [23] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676):354-359. [24] TAMPUU A, MATⅡSEN T, KODELIA D, et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PloS One, 2017, 12(4):e0172395. [25] THRUN S B. Efficient Exploration in Reinforcement Learning[R]. Pittsburgh, PA:Carnegie Mellon University, 1992. |