1 |
刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1): 1-27. 10.11897/SP.J.1016.2018.00001
|
|
LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1): 1-27. 10.11897/SP.J.1016.2018.00001
|
2 |
BELLEMARE M G, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 1479-1487.
|
3 |
PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 2778-2787. 10.1109/cvprw.2017.70
|
4 |
BURDA Y, EDWARDS H, STORKEY A, et al. Exploration by random network distillation[EB/OL]. (2018-10-30) [2021-02-21]..
|
5 |
AGRAWAL P, NAIR A, ABBEEL P, et al. Learning to poke by poking: experiential learning of intuitive physics[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016:5092-5100. 10.1109/icra.2017.7989247
|
6 |
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05) [2021-02-21]..
|
7 |
WATKINS C J C H. Learning from delayed rewards[D]. Cambridge: University of Cambridge, King’s College, 1989:44-46.
|
8 |
GOODFELLOW I, BENGIO Y, COURVILLE A, et al. Deep Learning[M]. Cambridge: MIT Press, 2016:143-144.
|
9 |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[EB/OL]. (2013-12-19) [2021-02-21].. 10.1038/nature14236
|
10 |
SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]// Proceedings of the 31st International Conference on Machine Learning. New York: JMLR.org, 2014: 387-395.
|
11 |
SUTTON R S, McALLESTER D, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999:1057-1063.
|
12 |
KAKADE S. A natural policy gradient[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Cambridge: MIT Press, 2001:1531-1538.
|
13 |
时圣苗,刘全. 采用分类经验回放的深度确定性策略梯度方法[J/OL]. 自动化学报. (2019-10-17) [2021-02-21]. .
|
|
SHI S M, LIU Q. Deep deterministic policy gradient with classified experience replay[J/OL]. Acta Automatica Sinica. (2019-10-17) [2021-02-21]. .
|
14 |
杨瑞,严江鹏,李秀. 强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报, 2020, 15(5):888-899. 10.11992/tis.202003031
|
|
YANG R, YAN J P, LI X. Survey of sparse reward algorithms in reinforcement learning - theory and experiment[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5):888-899. 10.11992/tis.202003031
|
15 |
ACHIAM J, SASTRY S. Surprise-based intrinsic motivation for deep reinforcement learning[EB/OL]. (2017-03-06) [2021-02-21].. 10.48550/arXiv.1703.01732
|
16 |
SCHMIDHUBER J. Formal theory of creativity, fun, and intrinsic motivation (1990-2010)[J]. IEEE Transactions on Autonomous Mental Development, 2010, 2(3): 230-247. 10.1109/tamd.2010.2056368
|
17 |
BURDA Y, EDWARDS H, PATHAK D, et al. Large-scale study of curiosity-driven learning[EB/OL]. (2018-08-13) [2021-02-21]..
|
18 |
SCHMIDHUBER J. A possibility for implementing curiosity and boredom in model-building neural controllers[C]// Proceedings of the 1st International Conference on Simulation of Adaptive Behavior: From Animals to Animats. Cambridge: MIT Press, 1991: 222-227. 10.7551/mitpress/3115.003.0030
|
19 |
AGRAWAL P, CARREIRA J, MALIK J. Learning to see by moving[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 37-45. 10.1109/iccv.2015.13
|
20 |
TAÏGA A A, FEDUS W, MACHADO M C, et al. On bonus based exploration methods in the arcade learning environment[EB/OL]. (2021-09-22) [2021-11-21]..
|
21 |
SCHMIDHUBER J. Formal theory of creativity, fun, and intrinsic motivation [J]. IEEE Transactions on Autonomous Mental Development, 2010, 2(3): 230-247. 10.1109/tamd.2010.2056368
|
22 |
TODOROV E, EREZ T, TASSA Y. MuJoCo: a physics engine for model-based control[C]// Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway: IEEE, 2012: 5026-5033. 10.1109/iros.2012.6386109
|