[1] LAGOUDAKIS M G, PARR R. Least squares policy iteration[J]. Journal of Machine Learning Research, 2003, 4(6):1107-1149. [2] BUSONIU L, ERNST D, de SCHUTTER B, et al. Online least-squares policy iteration for reinforcement learning control[C]//Proceedings of the 2010 American Control Conference. Piscataway, NJ:IEEE, 2010:486-491. [3] 周鑫, 刘全, 傅启明, 等. 一种批量最小二乘策略迭代方法[J]. 计算机科学, 2014, 41(9):232-238. (ZHOU X, LIU Q, FU Q M, et al. Batch least-squares policy iteration[J]. Computer Science, 2014, 41(9):232-238.) [4] 傅启明, 刘全, 伏玉琛, 等. 一种高斯过程的带参近似策略迭代算法[J]. 软件学报, 2013, 24(11):2676-2686. (FU Q M, LIU Q, FU Y C, et al. Parametric approximation policy iteration algorithm based on gaussian process[J]. Journal of Software, 2013, 24(11):2676-2686.) [5] 傅启明. 强化学习中离策略算法的分析及研究[D]. 苏州:苏州大学, 2014:72-85. (FU Q M. Analysis and research on off-policy algorithms in reinforcement learning[D]. Suzhou:Soochow University, 2014:72-85.) [6] 尤树华.贝叶斯强化学习中策略迭代算法研究[D]. 苏州:苏州大学, 2016:50-57.(YOU S H. Research on policy iteration algorithm within Bayesian reinforcement learning[D]. Suzhou:Soochow University, 2016:50-57.) [7] XU X, PENG C, DAI B, et al. A kernel-based reinforcement learning approach to stochastic pole balancing control system[C]//Proceedings of the 2010 IEEE/ASME International Conference on Advanced Intelligent Mechatronics. Piscataway, NJ:IEEE, 2010:1329-1334. [8] BARRETO A M S, PRECUP D, PINEAU J. Practical kernel-based reinforcement learning[J]. Journal of Machine Learning Research, 2016(17):1-70. [9] 朱稷涵. 基于非参函数逼近的强化学习算法研究[D]. 苏州:苏州大学, 2014:18-28.(ZHU J H. Research on reinforcement learning algorithm based on nonparametric approximation[D]. Suzhou:Soochow University, 2014:18-28.) [10] 闫称. 基于测地高斯核的策略迭代强化学习[D]. 徐州:中国矿业大学, 2015:17-42.(YAN C. Policy iteration reinforcement learning based on geodesic Gaussian kernel[D]. Xuzhou:China University of Mining and Technology, 2015:17-42.) [11] 王雪松, 朱美强, 程玉虎. 强化学习原理及其应用[M]. 北京:科学出版社, 2014:58.(WANG X S, ZHU M Q, CHEN Y H. Principle and Application of Reinforcement Learning[M]. Beijing:Science Press, 2014:58.) [12] 于剑, 程乾生. 模糊聚类方法中的最佳聚类数的搜索范围[J]. 中国科学(E辑), 2002, 32(2):274-280. (YU J, CHENG Q S. The search scope of optimal cluster number in fuzzy clustering method[J]. Science in China (Series E), 2002, 32(2):274-280.) |