基于Dyna框架的非参数化近似策略迭代增强学习
季挺, 张华
Nonparametric approximation policy iteration reinforcement learning based on Dyna framework
JI Ting, ZHANG Hua
计算机应用 . 2018, (5): 1230 -1238 .  DOI: 10.11772/j.issn.1001-9081.2017102531