Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3346-3353.DOI: 10.11772/j.issn.1001-9081.2021122169
Special Issue: 第九届CCF大数据学术会议(CCF Bigdata 2021)
• CCF Bigdata 2021 • Previous Articles Next Articles
Rong ZANG1, Li WANG1(), Tengfei SHI2
Received:
2021-12-21
Revised:
2022-01-14
Accepted:
2022-01-24
Online:
2022-03-04
Published:
2022-11-10
Contact:
Li WANG
About author:
ZANG Rong, born in 1997, M. S. candidate. His research interests include reinforcement learning, multi-agent system.Supported by:
通讯作者:
王莉
作者简介:
臧嵘(1997—),男,山西太原人,硕士研究生,主要研究方向:强化学习、多智能体系统CLC Number:
Rong ZANG, Li WANG, Tengfei SHI. Multi‑agent reinforcement learning based on attentional message sharing[J]. Journal of Computer Applications, 2022, 42(11): 3346-3353.
臧嵘, 王莉, 史腾飞. 基于注意力消息共享的多智能体强化学习[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3346-3353.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021122169
地图 | 友方单位 | 敌方单位 |
---|---|---|
2s3z | 2个追猎者和3个狂战士 | 2个追猎者和3个狂战士 |
1c3s5z | 1个巨像,3个追猎者和5个狂战士 | 1个巨像,3个追猎者和5个狂战士 |
3s5z | 3个追猎者和5个狂战士 | 3个追猎者和5个狂战士 |
8m | 8个海军陆战队 | 8个海军陆战队 |
Tab. 1 Details of SMAC maps
地图 | 友方单位 | 敌方单位 |
---|---|---|
2s3z | 2个追猎者和3个狂战士 | 2个追猎者和3个狂战士 |
1c3s5z | 1个巨像,3个追猎者和5个狂战士 | 1个巨像,3个追猎者和5个狂战士 |
3s5z | 3个追猎者和5个狂战士 | 3个追猎者和5个狂战士 |
8m | 8个海军陆战队 | 8个海军陆战队 |
地图 | AMSAC | MSAC | Native AC | COMA | CommNet | GA‑Comm |
---|---|---|---|---|---|---|
2s3z | 47.02(41.60~54.74) | 29.82(20.35~32.37) | 30.61(21.18~39.83) | 15.19(12.95~17.68) | 4.95(3.78~6.44) | 7.34(4.34~15.25) |
1c3s5z | 41.96(32.76~46.88) | 28.25(21.67~30.71) | 26.72(21.88~31.00) | 15.29(8.38~22.01) | 0.23(0.00~0.98) | 0.22(0.00~0.72) |
3s5z | 4.21(3.56~5.15) | 1.17(0.34~2.39) | 0.76(0.09~2.05) | 0.08(0.00~0.11) | 0.01(0.00~0.03) | 0.01(0.00~0.02) |
8m | 85.06(78.68~86.75) | 90.45(89.74~91.22) | 89.51(88.59~90.70) | 84.51(83.54~85.07) | 24.57(8.71~54.24) | 45.91(28.45~54.32) |
Tab. 2 Average win rate of single independent experiment
地图 | AMSAC | MSAC | Native AC | COMA | CommNet | GA‑Comm |
---|---|---|---|---|---|---|
2s3z | 47.02(41.60~54.74) | 29.82(20.35~32.37) | 30.61(21.18~39.83) | 15.19(12.95~17.68) | 4.95(3.78~6.44) | 7.34(4.34~15.25) |
1c3s5z | 41.96(32.76~46.88) | 28.25(21.67~30.71) | 26.72(21.88~31.00) | 15.29(8.38~22.01) | 0.23(0.00~0.98) | 0.22(0.00~0.72) |
3s5z | 4.21(3.56~5.15) | 1.17(0.34~2.39) | 0.76(0.09~2.05) | 0.08(0.00~0.11) | 0.01(0.00~0.03) | 0.01(0.00~0.02) |
8m | 85.06(78.68~86.75) | 90.45(89.74~91.22) | 89.51(88.59~90.70) | 84.51(83.54~85.07) | 24.57(8.71~54.24) | 45.91(28.45~54.32) |
地图 | AMSAC | MSAC | Native AC | COMA | CommNet | GA‑Comm |
---|---|---|---|---|---|---|
2s3z | 92.19 | 90.63 | 87.50 | 56.25 | 34.38 | 50.00 |
1c3s5z | 100.00 | 100.00 | 100.00 | 78.13 | 6.25 | 21.88 |
3s5z | 46.88 | 31.25 | 21.88 | 6.25 | 3.13 | 3.13 |
8m | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
Tab. 3 Highest win rate in single evaluation obtained by independent experiment
地图 | AMSAC | MSAC | Native AC | COMA | CommNet | GA‑Comm |
---|---|---|---|---|---|---|
2s3z | 92.19 | 90.63 | 87.50 | 56.25 | 34.38 | 50.00 |
1c3s5z | 100.00 | 100.00 | 100.00 | 78.13 | 6.25 | 21.88 |
3s5z | 46.88 | 31.25 | 21.88 | 6.25 | 3.13 | 3.13 |
8m | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
1 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human‑level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. 10.1038/nature14236 |
2 | 刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018, 41(1):1-27. 10.11897/SP.J.1016.2018.00001 |
LIU Q, ZHAI J W, ZHANG Z Z, et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018, 41(1):1-27. 10.11897/SP.J.1016.2018.00001 | |
3 | TROITZSCH K G. Multi-agent systems and simulation: a survey from an application perspective[M]// UHRMACHER A M, WEYNS D. Multi-Agent Systems: Simulation and Applications. Boca Raton: CRC Press, 2009: 53-76. 10.1201/9781420070248.ch2 |
4 | HERNANDEZ‑LEAL P, KARTAL B, TAYLOR M E. A survey and critique of multiagent deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2019, 33(6): 750-797. 10.1007/s10458-019-09421-1 |
5 | 孙长银,穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. 10.16383/j.aas.c200159 |
SUN C Y, MU C X. Important scientific problems of multi‑agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312. 10.16383/j.aas.c200159 | |
6 | SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2016: 2252-2260. |
7 | PENG P, WEN Y, YANG Y D, et al. Multiagent bidirectionally‑ coordinated nets: emergence of human‑level coordination in learning to play StarCraft combat games[EB/OL]. (2017-09-14) [2021-02-12].. 10.48550/arXiv.1703.10069 |
8 | DAS A, GERVET T, ROMOFF J, et al. TarMAC: targeted multi‑ agent communication[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 1538-1546. |
9 | SINGH A, JAIN T, SUKHBAATAR S. Learning when to communicate at scale in multiagent cooperative and competitive tasks[EB/OL]. (2018-12-23) [2021-02-12].. |
10 | LIU Y, WANG W X, HU Y J, et al. Multi‑agent game abstraction via graph attention neural network[C]// Proceedings of the 34th Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 7211-7218. 10.1609/aaai.v34i05.6211 |
11 | MAO H Y, ZHANG Z C, XIAO Z, et al. Learning multi‑agent communication with double attentional deep reinforcement learning[J]. Autonomous Agents and Multi‑Agent Systems, 2020, 34(1): No.32. 10.1007/s10458-020-09455-w |
12 | SU J Y, ADAMS S, BELING P. Value‑decomposition multi‑agent actor‑critics[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2021: 11352-11360. 10.1609/aaai.v35i13.17353 |
13 | SAMVELYAN M, RASHID T, SCHROEDER DE WITT C, et al. The StarCraft multi‑agent challenge[C]// Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and MultiAgent Systems, 2019: 2186-2188. |
14 | WILLIAMS R J. Simple statistical gradient‑following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256. 10.1007/bf00992696 |
15 | LOWE R, WU Y, TAMAR A, et al. Multi‑agent actor‑critic for mixed cooperative‑competitive environments[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6382-6393. |
16 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05) [2021-02-12].. |
17 | FOERSTER J N, FARQUHAR G, AFOURAS T, et al. Counterfactual multi‑agent policy gradients[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 2974-2982. 10.1609/aaai.v32i1.11794 |
18 | ZHANG K Q, YANG Z R, LIU H, et al. Fully decentralized multi‑agent reinforcement learning with networked agents[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 5872-5881. |
19 | JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 7265-7275. |
20 | IQBAL S, SHA F. Actor‑attention‑critic for multi‑agent reinforcement learning[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 2961-2970. |
21 | BERNSTEIN D S, GIVAN R, IMMERMAN N, et al. The complexity of decentralized control of Markov decision processes[J]. Mathematics of Operations Research, 2002, 27(4): 819-840. 10.1287/moor.27.4.819.297 |
22 | SUTTON R S, McALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1057-1063. |
23 | KONDA V R, TSITSIKLIS J N. Actor‑critic algorithms[C]// Proceedings of the 12th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1999: 1008-1014. |
24 | MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2204-2212. |
25 | CHO K, van MERRIËNBOER B, GU̇LÇEHRE Ç, et al. Learning phrase representations using RNN encoder‑decoder for statistical machine translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014: 1724-1734. 10.3115/v1/d14-1179 |
26 | XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 2048-2057. 10.1109/cvpr.2015.7298935 |
27 | CHUNG J, GU̇LÇEHRE Ç, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [S/OL]. (2014-12-11) [2021-10-25].. 10.1007/978-3-030-89929-5_3 |
28 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[3] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[4] | Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341. |
[5] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[6] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[7] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[8] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[9] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[10] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[11] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[12] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
[13] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[14] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[15] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||