[1] KHAN S G, HERRMANN G, LEWIS F L, et al. Reinforcement learning and optimal adaptive control:an overview and implementation examples[J]. Annual Reviews in Control, 2012, 36(1):42-59. [2] 陈学松, 杨宜民. 强化学习研究综述[J]. 计算机应用研究, 2010, 27(8):2834-2838.(CHEN X S, YANG Y M. Reinforcement learning:survey of recent work[J]. Application Research of Computers, 2010, 8(27):2834-2838.) [3] 赵冬斌, 邵坤, 朱圆恒,等. 深度强化学习综述:兼论计算机围棋的发展[J]. 控制理论与应用, 2016, 33(6):701-717.(ZHAO D B, SHAO K, ZHU Y H, et al. Review of deep reinforcement learning and discussions on the development of computer Go[J]. Control Theory & Applications, 2016, 33(6):701-717.) [4] SUTTON R S, PRECUP D, SINGH S. Between MDPs and semi-MDPs:a framework for temporal abstraction in reinforcement learning[J]. Artificial Intelligence, 1999, 112(1/2):181-211. [5] DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000, 13(1):227-303. [6] PARR R E. Hierarchical control and learning for Markov decision processes[D]. Berkeley:University of California at Berkeley, 1998:87-109. [7] HENGST B. Discovering hierarchy in reinforcement learning with HEXQ[C]//Proceedings of the 19th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann Publishers Inc., 2002:243-250. [8] MCGOVERN E A. Autonomous discovery of temporal abstractions from interaction with an environment[D]. Amherst:University of Massachusetts Amherst, 2002:26-38. [9] STOLLE M. Automated discovery of options in reinforcement learning[D]. Montreal:McGill University, 2004:21-31. [10] MEHTA N, RAY S, TADEPALLI P, et al. Automatic discovery and transfer of MAXQ hierarchies[C]//Proceedings of the 25th International Conference on Machine Learning. New York:ACM, 2008:648-655. [11] 石川, 史忠植, 王茂光. 基于路径匹配的在线分层强化学习方法[J]. 计算机研究与发展, 2008, 45(9):1470-1476.(SHI C, SHI Z Z, WANG M G. Online hierarchical reinforcement learning based on path-matching[J]. Journal of Computer Research and Development, 2008, 45(9):1470-1476.) [12] 沈晶. 分层强化学习方法研究[D]. 哈尔滨:哈尔滨工程大学, 2006:28-55.(SHEN J. Research on hierarchical reinforcement learning approach[D]. Harbin:Harbin Engineering University, 2006:28-55.) [13] 陈兴国, 俞扬. 强化学习及其在电脑围棋中的应用[J]. 自动化学报, 2016, 42(5):685-695.(CHEN X G, YU Y. Reinforcement learning and its application to the game of go[J]. Acta Automatica Sinica, 2016, 42(5):685-695.) [14] BARTO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement learning[J]. Discrete Event Dynamic Systems, 2003, 13(4):341-379. [15] JONG N K, STONE P. State abstraction discovery from irrelevant state variables[C]//Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence. San Francisco, CA:Morgan Kaufmann Publishers Inc., 2005:752-757. [16] TAKAHASHI Y, ASADA M. Multi-controller fusion in multi-layered reinforcement learning[C]//Proceedings of the 2001 International Conference on Multisensor Fusion and Integration for Intelligent Systems. Piscataway, NJ:IEEE, 2001:7-12. [17] STOLLE M, PRECUP D. Learning options in reinforcement learning[C]//Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. London:Springer-Verlag, 2002:212-223. [18] 苏畅, 高阳, 陈世福,等. 基于SMDP环境的自主生成Options算法的研究[J]. 模式识别与人工智能, 2005, 18(6):679-684.(SU C, GAO Y, CHEN S F, et al. The study of recognizing Options based on SMDP[J]. Pattern Recognition and Artificial Intelligence, 2005, 18(6):679-684.) |