1 |
杨瑞,严江鹏,李秀.强化学习稀疏奖励算法研究——理论与实验[J].智能系统学报, 2020, 15(5): 888-899. 10.11992/tis.202003031
|
|
YANG R, YAN J P, LI X. Survey of sparse reward algorithms in reinforcement learning — theory and experiment[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5): 888-899. 10.11992/tis.202003031
|
2 |
李波,越凯强,甘志刚,等.基于MADDPG的多无人机协同任务决策[J].宇航学报, 2021, 42(6): 757-765. 10.3873/j.issn.1000-1328.2021.06.009
|
|
LI B, YUE K Q, GAN Z G, et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient[J]. Journal of Astronautics, 2021, 42(6): 757-765. 10.3873/j.issn.1000-1328.2021.06.009
|
3 |
YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2020: 621-632.
|
4 |
LI Y X. Deep reinforcement learning: an overview[EB/OL]. (2018-11-26) [2021-10-11]. . 10.1109/tpami.2023.3285634/mm1
|
5 |
BADIA A P, SPRECHMANN P, VITVITSKYI A, et al. Never give up: learning directed exploration strategies[EB/OL]. (2020-02-14) [2021-11-05]. .
|
6 |
PATHAK D, AGRAWAL P, EFROS A A, et al. Curiosity-driven exploration by self-supervised prediction [C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 2778-2787. 10.1109/cvprw.2017.70
|
7 |
OUDEYER P Y, KAPLAN F. How can we define intrinsic motivation?[C/OL]// Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems [2021-11-05]. . 10.1016/j.cogsys.2003.11.001
|
8 |
STREHL A L, LITTMAN M L. An analysis of model-based Interval Estimation for Markov Decision Processes[J]. Journal of Computer and System Sciences, 2008, 74(8): 1309-1331. 10.1016/j.jcss.2007.08.009
|
9 |
LAI T L, ROBBINS H. Asymptotically efficient adaptive allocation rules[J]. Advances in Applied Mathematics, 1985, 6(1): 4-22. 10.1016/0196-8858(85)90002-8
|
10 |
OSTROVSKI G, BELLEMARE M G, A van den OORD, et al. Count-based exploration with neural density models [C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 2721-2730.
|
11 |
BURDA Y, EDWARDS H, STORKEY A, et al. Exploration by random network distillation[EB/OL]. (2018-10-30) [2021-12-18]. .
|
12 |
TANG H R, HOUTHOOFT R, FOOTE D, et al. #Exploration: a study of count-based exploration for deep reinforcement learning [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 2750-2759. 10.1109/icccbda.2017.7951951
|
13 |
PARISOTTO E, BA J, SALAKHUTDINOV R. Actor-mimic: deep multitask and transfer reinforcement learning[EB/OL]. (2016-02-22) [2020-11-09]. .
|
14 |
RUSU A A, COLMENAREJO S G, GÜLÇEHRE Ç, et al. Policy distillation[EB/OL]. (2016-01-07) [2020-09-07]. .
|
15 |
姜玉斌,刘全,胡智慧.带最大熵修正的行动者评论家算法[J].计算机学报, 2020, 43(10): 1897-1908. 10.11897/SP.J.1016.2020.01897
|
|
JIANG Y B, LIU Q, HU Z H. Actor-critic algorithm with maximum-entropy correction[J]. Chinese Journal of Computers, 2020, 43(10): 1897-1908. 10.11897/SP.J.1016.2020.01897
|
16 |
SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998: 75-76.
|
17 |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. 10.1038/nature14236
|
18 |
WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256. 10.1007/bf00992696
|
19 |
KONDA V R, TSITSIKLIS J N. Actor-critic algorithms [C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2000: 1008-1014.
|
20 |
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [C]// Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR.org, 2016: 1928-1937.
|
21 |
SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 1889-1897.
|
22 |
SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-08-28) [2021-09-29]. .
|
23 |
THOMPSON W R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J]. Biometrika, 1933, 25(3/4): 285-294. 10.1093/biomet/25.3-4.285
|
24 |
HAARNOJA T, TANG H R, ABBEEL P, et al. Reinforcement learning with deep energy-based policies [C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 1352-1361. 10.1007/978-1-4899-7687-1_142
|
25 |
OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN [C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 4033-4041
|
26 |
BELLEMARE M G, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation [C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 1479-1487.
|
27 |
STADIE B C, LEVINE S, ABBEEL P. Incentivizing exploration in reinforcement learning with deep predictive models[EB/OL]. (2015-11-19) [2020-12-18]. .
|
28 |
BURDA Y, EDWARDS H, PATHAK D, et al. Large-scale study of curiosity-driven learning[EB/OL]. (2018-08-13) [2022-01-08]. .
|
29 |
SONG Y, CHEN Y F, HU Y J, et al. Exploring unknown states with action balance [C]// Proceedings of the 2020 IEEE Conference on Games. Piscataway: IEEE, 2020: 184-191. 10.1109/cog47356.2020.9231562
|
30 |
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015-03-09) [2020-12-19]. .
|
31 |
CZARNECKI W M, PASCANU R, OSINDERO S, et al. Distilling policy distillation [C]// Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. New York: JMLR.org, 2019: 1331-1340. 10.24963/ijcai.2020/435
|