1 |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human‑level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. 10.1038/nature14236
|
2 |
刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1):1-27. 10.11897/SP.J.1016.2018.00001
|
|
LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27. 10.11897/SP.J.1016.2018.00001
|
3 |
TROITZSCH K G. Multi-agent systems and simulation: a survey from an application perspective[M]// UHRMACHER A M, WEYNS D. Multi-Agent Systems: Simulation and Applications. Boca Raton: CRC Press, 2009: 53-76. 10.1201/9781420070248.ch2
|
4 |
HERNANDEZ‑LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2019, 33(6): 750-797. 10.1007/s10458-019-09421-1
|
5 |
孙长银,穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. 10.16383/j.aas.c200159
|
|
SUN C Y, MU C X. Important scientific problems of multi‑agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. 10.16383/j.aas.c200159
|
6 |
SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 2252-2260.
|
7 |
PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally‑ coordinated nets: emergence of human‑level coordination in learning to play StarCraft combat games[EB/OL]. (2017-09-14) [2021-02-12].. 10.48550/arXiv.1703.10069
|
8 |
DAS A, GERVET T, ROMOFF J, et al. TarMAC: targeted multi‑ agent communication[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 1538-1546.
|
9 |
SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks[EB/OL]. (2018-12-23) [2021-02-12]..
|
10 |
LIU Y, WANG W X, HU Y J, et al. Multi‑agent game abstraction via graph attention neural network[C]// Proceedings of the 34th Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 7211-7218. 10.1609/aaai.v34i05.6211
|
11 |
MAO H Y, ZHANG Z C, XIAO Z, et al. Learning multi‑agent communication with double attentional deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2020, 34(1): No.32. 10.1007/s10458-020-09455-w
|
12 |
SU J Y, ADAMS S, BELING P. Value‑decomposition multi‑agent actor‑critics[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 11352-11360. 10.1609/aaai.v35i13.17353
|
13 |
SAMVELYAN M, RASHID T, SCHROEDER DE WITT C, et al. The StarCraft multi‑agent challenge[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and MultiAgent Systems, 2019: 2186-2188.
|
14 |
WILLIAMS R J. Simple statistical gradient‑following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256. 10.1007/bf00992696
|
15 |
LOWE R, WU Y, TAMAR A, et al. Multi‑agent actor‑critic for mixed cooperative‑competitive environments[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6382-6393.
|
16 |
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05) [2021-02-12]..
|
17 |
FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi‑agent policy gradients[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 2974-2982. 10.1609/aaai.v32i1.11794
|
18 |
ZHANG K Q, YANG Z R, LIU H, et al. Fully decentralized multi‑agent reinforcement learning with networked agents[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 5872-5881.
|
19 |
JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 7265-7275.
|
20 |
IQBAL S, SHA F. Actor‑attention‑critic for multi‑agent reinforcement learning[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 2961-2970.
|
21 |
BERNSTEIN D S, GIVAN R, IMMERMAN N, et al. The complexity of decentralized control of Markov decision processes[J]. Mathematics of Operations Research, 2002, 27(4): 819-840. 10.1287/moor.27.4.819.297
|
22 |
SUTTON R S, McALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1057-1063.
|
23 |
KONDA V R, TSITSIKLIS J N. Actor‑critic algorithms[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1008-1014.
|
24 |
MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2204-2212.
|
25 |
CHO K, van MERRIËNBOER B, GU̇LÇEHRE Ç, et al. Learning phrase representations using RNN encoder‑decoder for statistical machine translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1724-1734. 10.3115/v1/d14-1179
|
26 |
XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 2048-2057. 10.1109/cvpr.2015.7298935
|
27 |
CHUNG J, GU̇LÇEHRE Ç, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [S/OL]. (2014-12-11) [2021-10-25].. 10.1007/978-3-030-89929-5_3
|
28 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010.
|