Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3641-3646.DOI: 10.11772/j.issn.1001-9081.2022101511

• Frontier and comprehensive applications • Previous Articles    

UAV cluster cooperative combat decision-making method based on deep reinforcement learning

Lin ZHAO1, Ke LYU1, Jing GUO2, Chen HONG3, Xiancai XIANG1, Jian XUE1, Yong WANG4()   

  1. 1.School of Engineering Science,University of Chinese Academy of Sciences,Beijing 100049,China
    2.College of Electronic and Information Engineering,Shenyang Aerospace University,Shenyang Liaoning 110136,China
    3.College of Robotics,Beijing Union University,Beijing 100101,China
    4.School of Artificial Intelligence,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-10-13 Revised:2023-04-19 Accepted:2023-04-21 Online:2023-05-24 Published:2023-11-10
  • Contact: Yong WANG
  • About author:ZHAO Lin, born in 1998, Ph. D. candidate. Her research interestsinclude deep reinforcement learning, unmanned aerial vehicle clustercontrol, game theory.
    LYU Ke, born in 1971, Ph. D., professor. His research interestsinclude artificial intelligence, computer vision.
    GUO Jing, born in 1997, M. S. His research interests include deepreinforcement learning, unmanned aerial vehicle cluster control.
    HONG Chen, born in 1974, Ph. D., associate professor. Hisresearch interests include unmanned aerial vehicle cluster control.
    XIANG Xiancai, born in 1997, M. S. candidate, His researchinterests include deep reinforcement learning, multi-agent system control.
    XUE Jian, born in 1979, Ph. D., professor. His research interestsinclude multi-agent system control, image processing.
    WANG Yong, born in 1975, Ph. D., research fellow. His researchinterests include modeling and optimization of complex systems, patternrecognition, data mining.
  • Supported by:
    National Key Research and Development Program of China(2018AAA0100804)

基于深度强化学习的无人机集群协同作战决策方法

赵琳1, 吕科1, 郭靖2, 宏晨3, 向贤财1, 薛健1, 王泳4()   

  1. 1.中国科学院大学 工程科学学院,北京 100049
    2.沈阳航空航天大学 电子信息工程学院,沈阳 110136
    3.北京联合大学 机器人学院,北京 100101
    4.中国科学院大学 人工智能学院,北京 100049
  • 通讯作者: 王泳
  • 作者简介:赵琳(1998—),女,辽宁盘锦人,博士研究生,主要研究方向:深度强化学习、无人机集群控制、博弈论
    吕科(1971—),男,宁夏西吉人,教授,博士,CCF会员,主要研究方向:人工智能、计算机视觉
    郭靖(1997—),男,陕西咸阳人,硕士,主要研究方向:深度强化学习、无人机集群控制
    宏晨(1974—),男,宁夏青铜峡人,副教授,博士,主要研究方向:无人机集群控制
    向贤财(1997—),男,湖北施恩人,硕士研究生,主要研究方向:深度强化学习、多智能体系统控制
    薛健(1979—),男,江苏宜兴人,教授,博士,CCF会员,主要研究方向:多智能体系统控制、图像处理
    王泳(1975—),男,山东济南人,研究员,博士,主要研究方向:复杂系统建模与优化、模式识别、数据挖掘。wangyong@ucas.ac.cn
  • 基金资助:
    国家重点研发计划项目(2018AAA0100804)

Abstract:

When the Unmanned Aerial Vehicle (UAV) cluster attacks ground targets, it will be divided into two formations: a strike UAV cluster that attacks the targets and a auxiliary UAV cluster that pins down the enemy. When auxiliary UAVs choose the action strategy of aggressive attack or saving strength, the mission scenario is similar to a public goods game where the benefits to the cooperator are less than those to the betrayer. Based on this, a decision method for cooperative combat of UAV clusters based on deep reinforcement learning was proposed. First, by building a public goods game based UAV cluster combat model, the interest conflict problem between individual and group in cooperation of intelligent UAV clusters was simulated. Then, Muti-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm was used to solve the most reasonable combat decision of the auxiliary UAV cluster to achieve cluster victory with minimum loss cost. Training and experiments were performed under conditions of different numbers of UAV. The results show that compared to the training effects of two algorithms — IDQN (Independent Deep Q-Network) and ID3QN (Imitative Dueling Double Deep Q-Network), the proposed algorithm has the best convergence, its winning rate can reach 100% with four auxiliary UAVs, and it also significantly outperforms the comparison algorithms with other UAV numbers.

Key words: Unmanned Aerial Vehicle (UAV), multi-cluster, public goods game, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), cooperative combat decision-making method

摘要:

在无人机(UAV)集群攻击地面目标时,UAV集群将分为两个编队:主攻目标的打击型UAV集群和牵制敌方的辅助型UAV集群。当辅助型UAV集群选择激进进攻或保存实力这两种动作策略时,任务场景类似于公共物品博弈,此时合作者的收益小于背叛者。基于此,提出一种基于深度强化学习的UAV集群协同作战决策方法。首先,通过建立基于公共物品博弈的UAV集群作战模型,模拟智能化UAV集群在合作中个体与集体间的利益冲突问题;其次,利用多智能体深度确定性策略梯度(MADDPG)算法求解辅助UAV集群最合理的作战决策,从而以最小的损耗代价实现集群胜利。在不同数量UAV情况下进行训练并展开实验,实验结果表明,与IDQN(Independent Deep Q-Network)和ID3QN(Imitative Dueling Double Deep Q-Network)这两种算法的训练效果相比,所提算法的收敛性最好,且在4架辅助型UAV情况下胜率可达100%,在其他UAV数情况下也明显优于对比算法。

关键词: 无人机, 多集群, 公共物品博弈, 多智能体深度确定性策略梯度, 协同作战决策方法

CLC Number: