Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3346-3353.DOI: 10.11772/j.issn.1001-9081.2021122169

• CCF Bigdata 2021 • Previous Articles    

Multi‑agent reinforcement learning based on attentional message sharing

Rong ZANG1, Li WANG1(), Tengfei SHI2   

  1. 1.College of Data Science,Taiyuan University of Technology,Jinzhong Shanxi 030600,China
    2.North Automatic Control Technology Institute,Taiyuan Shanxi 030006,China
  • Received:2021-12-21 Revised:2022-01-14 Accepted:2022-01-24 Online:2022-03-04 Published:2022-11-10
  • Contact: Li WANG
  • About author:ZANG Rong, born in 1997, M. S. candidate. His research interests include reinforcement learning, multi-agent system.
    WANG Li, born in 1971, Ph. D., professor. Her research interests include data mining, artificial intelligence, machine learning.
    SHI Tengfei, born in 1990, M. S., engineer. His research interests include deep reinforcement learning.
  • Supported by:
    National Natural Science Foundation of China(61872260)

基于注意力消息共享的多智能体强化学习

臧嵘1, 王莉1(), 史腾飞2   

  1. 1.太原理工大学 大数据学院,山西 晋中 030600
    2.北方自动控制技术研究所,太原 030006
  • 通讯作者: 王莉
  • 作者简介:臧嵘(1997—),男,山西太原人,硕士研究生,主要研究方向:强化学习、多智能体系统
    王莉(1971—),女,山西太原人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、人工智能、机器学习 wangli@tyut.edu.cn
    史腾飞(1990—),男,山西晋城人,工程师,硕士,CCF会员,主要研究方向:深度强化学习。

Abstract:

Communication is an important way to achieve effective cooperation among multiple agents in a non? omniscient environment. When there are a large number of agents, redundant messages may be generated in the communication process. To handle the communication messages effectively, a multi?agent reinforcement learning algorithm based on attentional message sharing was proposed, called AMSAC (Attentional Message Sharing multi?agent Actor?Critic). Firstly, a message sharing network was built for effective communication among agents, and information sharing was achieved through message reading and writing by the agents, thus solving the problem of lack of communication among agents in non?omniscient environment with complex tasks. Then, in the message sharing network, the communication messages were processed adaptively by the attentional message sharing mechanism, and the messages from different agents were processed with importance order to solve the problem that large?scale multi?agent system cannot effectively identify and utilize the messages during the communication process. Moreover, in the centralized Critic network, the Native Critic was used to update the Actor network parameters according to Temporal Difference (TD) advantage policy gradient, so that the action values of agents were evaluated effectively. Finally, during the execution period, the decision was made by the agent distributed Actor network based on its own observations and messages from message sharing network. Experimental results in the StarCraft Multi?Agent Challenge (SMAC) environment show that compared with Native Actor?Critic (Native AC), Game Abstraction Communication (GA?Comm) and other multi?agent reinforcement learning methods, AMSAC has an average win rate improvement of 4 - 32 percentage points in four different scenarios. AMSAC’s attentional message sharing mechanism provides a reasonable solution for processing communication messages among agents in a multi?agent system, and has broad application prospects in both transportation hub control and unmanned aerial vehicle collaboration.

Key words: multi?agent system, agent cooperation, deep reinforcement learning, agent communication, attention mechanism, policy gradient

摘要:

通信是非全知环境中多智能体间实现有效合作的重要途径,当智能体数量较多时,通信过程会产生冗余消息。为有效处理通信消息,提出一种基于注意力消息共享的多智能体强化学习算法AMSAC。首先,在智能体间搭建用于有效沟通的消息共享网络,智能体通过消息读取和写入完成信息共享,解决智能体在非全知、任务复杂场景下缺乏沟通的问题;其次,在消息共享网络中,通过注意力消息共享机制对通信消息进行自适应处理,有侧重地处理来自不同智能体的消息,解决较大规模多智能体系统在通信过程中无法有效识别消息并利用的问题;然后,在集中式Critic网络中,使用Native Critic依据时序差分(TD)优势策略梯度更新Actor网络参数,使智能体的动作价值得到有效评判;最后,在执行期间,智能体分布式Actor网络根据自身观测和消息共享网络的信息进行决策。在星际争霸Ⅱ多智能体挑战赛(SMAC)环境中进行实验,结果表明,与朴素Actor?Critic (Native AC)、博弈抽象通信(GA?Comm)等多智能体强化学习方法相比,AMSAC在四个不同场景下的平均胜率提升了4 ~ 32个百分点。AMSAC的注意力消息共享机制为处理多智能体系统中智能体间的通信消息提供了合理方案,在交通枢纽控制和无人机协同领域都具备广泛的应用前景。

关键词: 多智能体系统, 智能体协同, 深度强化学习, 智能体通信, 注意力机制, 策略梯度

CLC Number: