| 1 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human‑level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.  10.1038/nature14236 | 
																													
																						| 2 | 刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1):1-27.  10.11897/SP.J.1016.2018.00001 | 
																													
																						|  | LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27.  10.11897/SP.J.1016.2018.00001 | 
																													
																						| 3 | TROITZSCH K G. Multi-agent systems and simulation: a survey from an application perspective[M]// UHRMACHER A M, WEYNS D. Multi-Agent Systems: Simulation and Applications. Boca Raton: CRC Press, 2009: 53-76.  10.1201/9781420070248.ch2 | 
																													
																						| 4 | HERNANDEZ‑LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2019, 33(6): 750-797.  10.1007/s10458-019-09421-1 | 
																													
																						| 5 | 孙长银,穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312.  10.16383/j.aas.c200159 | 
																													
																						|  | SUN C Y, MU C X. Important scientific problems of multi‑agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312.  10.16383/j.aas.c200159 | 
																													
																						| 6 | SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 2252-2260. | 
																													
																						| 7 | PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally‑ coordinated nets: emergence of human‑level coordination in learning to play StarCraft combat games[EB/OL]. (2017-09-14) [2021-02-12]..  10.48550/arXiv.1703.10069 | 
																													
																						| 8 | DAS A, GERVET T, ROMOFF J, et al. TarMAC: targeted multi‑ agent communication[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 1538-1546. | 
																													
																						| 9 | SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks[EB/OL]. (2018-12-23) [2021-02-12].. | 
																													
																						| 10 | LIU Y, WANG W X, HU Y J, et al. Multi‑agent game abstraction via graph attention neural network[C]// Proceedings of the 34th Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 7211-7218.  10.1609/aaai.v34i05.6211 | 
																													
																						| 11 | MAO H Y, ZHANG Z C, XIAO Z, et al. Learning multi‑agent communication with double attentional deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2020, 34(1): No.32.  10.1007/s10458-020-09455-w | 
																													
																						| 12 | SU J Y, ADAMS S, BELING P. Value‑decomposition multi‑agent actor‑critics[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 11352-11360.  10.1609/aaai.v35i13.17353 | 
																													
																						| 13 | SAMVELYAN M, RASHID T, SCHROEDER DE WITT C, et al. The StarCraft multi‑agent challenge[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and MultiAgent Systems, 2019: 2186-2188. | 
																													
																						| 14 | WILLIAMS R J. Simple statistical gradient‑following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256.  10.1007/bf00992696 | 
																													
																						| 15 | LOWE R, WU Y, TAMAR A, et al. Multi‑agent actor‑critic for mixed cooperative‑competitive environments[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6382-6393. | 
																													
																						| 16 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05) [2021-02-12].. | 
																													
																						| 17 | FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi‑agent policy gradients[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 2974-2982.  10.1609/aaai.v32i1.11794 | 
																													
																						| 18 | ZHANG K Q, YANG Z R, LIU H, et al. Fully decentralized multi‑agent reinforcement learning with networked agents[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 5872-5881. | 
																													
																						| 19 | JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 7265-7275. | 
																													
																						| 20 | IQBAL S, SHA F. Actor‑attention‑critic for multi‑agent reinforcement learning[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 2961-2970. | 
																													
																						| 21 | BERNSTEIN D S, GIVAN R, IMMERMAN N, et al. The complexity of decentralized control of Markov decision processes[J]. Mathematics of Operations Research, 2002, 27(4): 819-840.  10.1287/moor.27.4.819.297 | 
																													
																						| 22 | SUTTON R S, McALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1057-1063. | 
																													
																						| 23 | KONDA V R, TSITSIKLIS J N. Actor‑critic algorithms[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1008-1014. | 
																													
																						| 24 | MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2204-2212. | 
																													
																						| 25 | CHO K, van MERRIËNBOER B, GU̇LÇEHRE Ç, et al. Learning phrase representations using RNN encoder‑decoder for statistical machine translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1724-1734.  10.3115/v1/d14-1179 | 
																													
																						| 26 | XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 2048-2057.  10.1109/cvpr.2015.7298935 | 
																													
																						| 27 | CHUNG J, GU̇LÇEHRE Ç, CHO K, et al.  Empirical evaluation of gated recurrent neural networks on sequence modeling [S/OL]. (2014-12-11) [2021-10-25]..  10.1007/978-3-030-89929-5_3 | 
																													
																						| 28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |