[1] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J]. Nature,2015,518(7540):529-533. [2] 刘全, 翟建伟, 章宗长, 等. 深度强化学习综述[J]. 计算机学报, 2018,41(1):1-27.(LIU Q,ZHAI J W,ZHANG Z Z,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers,2018,41(1):1-27.) [3] SCHULMAN J,WOLSKI F,DHARIWAL P,et al. Proximal policy optimization algorithms[EB/OL].[2020-09-03]. https://arxiv.org/pdf/1707.06347.pdf. [4] 殷昌盛, 杨若鹏, 朱巍, 等. 多智能体分层强化学习综述[J]. 智能系统学报, 2020, 15(4):646-655.(YIN C S,YANG R P,ZHU W, et al. A survey on multi-agent hierarchical reinforcement learning[J]. CAAI Transactions on Intelligent Systems,2020, 15(4):646-655.) [5] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报,2020,46(7):1301-1312.(SUN C Y,MU C X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica,2020,46(7):1301-1312.) [6] 王冲, 景宁, 李军, 等. 一种基于多agent强化学习的多星协同任务规划算法[J]. 国防科技大学学报,2011,33(1):53-58. (WANG C,JING N,LI J,et al. An algorithm of cooperative multiple satellites mission planning based on multi-agent reinforcement learning[J]. Journal of National University of Defense Technology,2011,33(1):53-58.) [7] VAN DER POL E, OLIEHOEK F A. Coordinated deep reinforcement learners for traffic light control[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2016:1-8. [8] JADERBERG M,CZARNECKI W M,DUNNING I,et al. Humanlevel performance in 3D multiplayer games with population-based reinforcement learning[J]. Science,2019,364(6443):859-865. [9] NAIR R,TAMBE M,YOKOO M,et al. Taming decentralized POMDPs:towards efficient policy computation for multiagent settings[C]//Proceedings of the 18th International Joint Conference on Artificial Intelligence. San Francisco:Morgan Kaufmann Publishers Inc.,2003:705-711. [10] LAURENT G J,MATIGNON L,LE FORT-PIAT N. The world of independent learners is not Markovian[J]. International Journal of Knowledge-based and Intelligent Engineering Systems,2011,15(1):55-64. [11] LOWE R,WU Y,TAMAR A,et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6382-6393. [12] FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA:AAAI,2018:2974-2982. [13] SUNEHAG P, LEVER G, GRUSLYS A, et al. Valuedecomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. Richland,SC:International Foundation for Autonomous Agents and Multiagent Systems,2018:2085-2087. [14] RASHID T,SAMVELYAN M,SCHROEDER C,et al. QMIX:monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Proceedings of the 35th International Conference on Machine Learning. New York:JMLR. org,2018:4295-4304. [15] SON K,KIM D,KANG W J,et al. QTRAN:learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:5887-5896. [16] YAO X,WEN C,WANG Y,et al. SMIX (λ):enhancing centralized value functions for cooperative multi-agent reinforcement learning[EB/OL].[2020-09-03]. https://arxiv.org/pdf/1911.04094.pdf. [17] SUKHBAATAR S,SZLAM A,FERGUS R. Learning multiagent communication with backpropagation[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2016:2252-2260. [18] FOERSTER J,ASSAEL Y M,DE FREITAS N,et al. Learning to communicate with deep multi-agent reinforcement learning[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2016:2145-2153. [19] IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:2961-2970. [20] JIANG J,LU Z. Learning attentional communication for multiagent cooperation[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2018:7265-7275. [21] JIANG J, DUN C, HUANG T, et al. Graph convolutional reinforcement learning[EB/OL].[2020-09-03]. https://arxiv.org/pdf/1810.09202.pdf. [22] LIU Y,WANG W,HU Y,et al. Multi-agent game abstraction via graph attention neural network[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI, 2020:7211-7218. [23] HAUSKNECHT M,STONE P. Deep recurrent Q-learning for partially observable MDPs[C]//Proceedings of the 2015 AAAI Fall Symposium Series. Palo Alto,CA:AAAI,2015:29-37. [24] OLIEHOEK F A, AMATO C. A Concise Introduction to Decentralized POMDPs[M]. Cham:Springer,2016:11-30. [25] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Associations for Computational Linguistics, 2014:1724-1734. [26] HOCHREITER S,SCHMIDHUBER J. Long short-term memory[J]. Neural Computation,1997,9(8):1735-1780. [27] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778. [28] SAMVELYAN M,RASHID T,DE WITT C S,et al. The StarCraft multi-agent challenge[C]//Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems. Richland,SC:International Foundation for Autonomous Agents and Multiagent Systems,2019:2186-2188. [29] HAHNLOSER R H R,SARPESHKAR R,MAHOWALD M A,et al. Digital selection and analogue amplification coexist in a cortexinspired silicon circuit[J]. Nature,2000,405(6789):947-951. |