UAV cluster cooperative combat decision-making method based on deep reinforcement learning

doi:10.11772/j.issn.1001-9081.2022101511

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3641-3646.DOI: 10.11772/j.issn.1001-9081.2022101511

• Frontier and comprehensive applications • Previous Articles

UAV cluster cooperative combat decision-making method based on deep reinforcement learning

Lin ZHAO¹, Ke LYU¹, Jing GUO², Chen HONG³, Xiancai XIANG¹, Jian XUE¹, Yong WANG⁴()

^1.School of Engineering Science，University of Chinese Academy of Sciences，Beijing 100049，China
^2.College of Electronic and Information Engineering，Shenyang Aerospace University，Shenyang Liaoning 110136，China
^3.College of Robotics，Beijing Union University，Beijing 100101，China
^4.School of Artificial Intelligence，University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-10-13 Revised:2023-04-19 Accepted:2023-04-21 Online:2023-05-24 Published:2023-11-10
Contact: Yong WANG
About author:ZHAO Lin， born in 1998， Ph. D. candidate. Her research interestsinclude deep reinforcement learning， unmanned aerial vehicle clustercontrol， game theory.
LYU Ke， born in 1971， Ph. D.， professor. His research interestsinclude artificial intelligence， computer vision.
GUO Jing， born in 1997， M. S. His research interests include deepreinforcement learning， unmanned aerial vehicle cluster control.
HONG Chen， born in 1974， Ph. D.， associate professor. Hisresearch interests include unmanned aerial vehicle cluster control.
XIANG Xiancai， born in 1997， M. S. candidate， His researchinterests include deep reinforcement learning， multi-agent system control.
XUE Jian， born in 1979， Ph. D.， professor. His research interestsinclude multi-agent system control， image processing.
WANG Yong， born in 1975， Ph. D.， research fellow. His researchinterests include modeling and optimization of complex systems， patternrecognition， data mining.
Supported by:
National Key Research and Development Program of China(2018AAA0100804)

基于深度强化学习的无人机集群协同作战决策方法

赵琳¹, 吕科¹, 郭靖², 宏晨³, 向贤财¹, 薛健¹, 王泳⁴()

^1.中国科学院大学工程科学学院，北京 100049
^2.沈阳航空航天大学电子信息工程学院，沈阳 110136
^3.北京联合大学机器人学院，北京 100101
^4.中国科学院大学人工智能学院，北京 100049

通讯作者: 王泳
作者简介:赵琳（1998—），女，辽宁盘锦人，博士研究生，主要研究方向：深度强化学习、无人机集群控制、博弈论
吕科（1971—），男，宁夏西吉人，教授，博士，CCF会员，主要研究方向：人工智能、计算机视觉
郭靖（1997—），男，陕西咸阳人，硕士，主要研究方向：深度强化学习、无人机集群控制
宏晨（1974—），男，宁夏青铜峡人，副教授，博士，主要研究方向：无人机集群控制
向贤财（1997—），男，湖北施恩人，硕士研究生，主要研究方向：深度强化学习、多智能体系统控制
薛健（1979—），男，江苏宜兴人，教授，博士，CCF会员，主要研究方向：多智能体系统控制、图像处理
王泳（1975—），男，山东济南人，研究员，博士，主要研究方向：复杂系统建模与优化、模式识别、数据挖掘。wangyong@ucas.ac.cn
基金资助:
国家重点研发计划项目(2018AAA0100804)

Abstract

Abstract:

When the Unmanned Aerial Vehicle （UAV） cluster attacks ground targets， it will be divided into two formations： a strike UAV cluster that attacks the targets and a auxiliary UAV cluster that pins down the enemy. When auxiliary UAVs choose the action strategy of aggressive attack or saving strength， the mission scenario is similar to a public goods game where the benefits to the cooperator are less than those to the betrayer. Based on this， a decision method for cooperative combat of UAV clusters based on deep reinforcement learning was proposed. First， by building a public goods game based UAV cluster combat model， the interest conflict problem between individual and group in cooperation of intelligent UAV clusters was simulated. Then， Muti-Agent Deep Deterministic Policy Gradient （MADDPG） algorithm was used to solve the most reasonable combat decision of the auxiliary UAV cluster to achieve cluster victory with minimum loss cost. Training and experiments were performed under conditions of different numbers of UAV. The results show that compared to the training effects of two algorithms — IDQN （Independent Deep Q-Network） and ID3QN （Imitative Dueling Double Deep Q-Network）， the proposed algorithm has the best convergence， its winning rate can reach 100% with four auxiliary UAVs， and it also significantly outperforms the comparison algorithms with other UAV numbers.

Key words: Unmanned Aerial Vehicle (UAV), multi-cluster, public goods game, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), cooperative combat decision-making method

摘要：

在无人机（UAV）集群攻击地面目标时，UAV集群将分为两个编队：主攻目标的打击型UAV集群和牵制敌方的辅助型UAV集群。当辅助型UAV集群选择激进进攻或保存实力这两种动作策略时，任务场景类似于公共物品博弈，此时合作者的收益小于背叛者。基于此，提出一种基于深度强化学习的UAV集群协同作战决策方法。首先，通过建立基于公共物品博弈的UAV集群作战模型，模拟智能化UAV集群在合作中个体与集体间的利益冲突问题；其次，利用多智能体深度确定性策略梯度（MADDPG）算法求解辅助UAV集群最合理的作战决策，从而以最小的损耗代价实现集群胜利。在不同数量UAV情况下进行训练并展开实验，实验结果表明，与IDQN（Independent Deep Q-Network）和ID3QN（Imitative Dueling Double Deep Q-Network）这两种算法的训练效果相比，所提算法的收敛性最好，且在4架辅助型UAV情况下胜率可达100%，在其他UAV数情况下也明显优于对比算法。

关键词: 无人机, 多集群, 公共物品博弈, 多智能体深度确定性策略梯度, 协同作战决策方法

CLC Number:

V279⁺.2

Lin ZHAO, Ke LYU, Jing GUO, Chen HONG, Xiancai XIANG, Jian XUE, Yong WANG. UAV cluster cooperative combat decision-making method based on deep reinforcement learning[J]. Journal of Computer Applications, 2023, 43(11): 3641-3646.

赵琳, 吕科, 郭靖, 宏晨, 向贤财, 薛健, 王泳. 基于深度强化学习的无人机集群协同作战决策方法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3641-3646.

Figures/Tables 7

Fig. 1 Model of UAV clusters for cooperative ground attacks

Fig. 2 Structure of MADDPG algorithm-based model

Tab. 1 Conditions for determining success or failure of mission after k-step attack

判定条件	状态	奖励
$ϕ > 90 %$ 且 $ψ > 10$ %	任务成功	100
$ϕ ≤ 90 %$ 且 $ψ > 10 %$	任务继续	0
$ϕ ≤ 90 %$ 且 $ψ ≤ 10 %$	任务失败	-100

Tab. 1 Conditions for determining success or failure of mission after k-step attack

判定条件	状态	奖励
$ϕ > 90 %$ 且 $ψ > 10$ %	任务成功	100
$ϕ ≤ 90 %$ 且 $ψ > 10 %$	任务继续	0
$ϕ ≤ 90 %$ 且 $ψ ≤ 10 %$	任务失败	-100

Tab. 2 Initial damage probability caused by combat unit to opponent

名称	含义	取值/%
$p i 1$	单架辅助UAV给RD造成的初始毁伤概率	10
$q i 1$	单架辅助UAV给RT造成的初始毁伤概率	5
$p R D 1$	RD给单架辅助UAV造成的初始毁伤概率	10
$p R T 1$	RT给单架辅助UAV造成的初始毁伤概率	20
$p D 1$	单架打击UAV攻击时给R造成的初始毁伤概率	5
$p R 1$	R给单架打击UAV造成的初始毁伤概率	10
$q R D 1$	RD给单架打击UAV造成的初始毁伤概率	20

Tab. 2 Initial damage probability caused by combat unit to opponent

名称	含义	取值/%
$p i 1$	单架辅助UAV给RD造成的初始毁伤概率	10
$q i 1$	单架辅助UAV给RT造成的初始毁伤概率	5
$p R D 1$	RD给单架辅助UAV造成的初始毁伤概率	10
$p R T 1$	RT给单架辅助UAV造成的初始毁伤概率	20
$p D 1$	单架打击UAV攻击时给R造成的初始毁伤概率	5
$p R 1$	R给单架打击UAV造成的初始毁伤概率	10
$q R D 1$	RD给单架打击UAV造成的初始毁伤概率	20

Fig. 3 Cumulative reward convergence curves of three algorithms

Fig. 4 Win rates of UAVs trained by three algorithms

Fig. 5 Cumulative rewards and win rates under different numbers of auxiliary UAVs

References 16

1	AYAMGA M， AKABA S， NYAABA A A. Multifaceted applicability of drones： a review［J］. Technological Forecasting and Social Change， 2021， 167： No.120677. 10.1016/j.techfore.2021.120677
2	马子玉，何明，刘祖均，等. 无人机协同控制研究综述［J］. 计算机应用， 2021， 41（5）：1477-1483. 10.11772/j.issn.1001-9081.2020081314
	MA Z Y， HE M， LIU Z J， et al. Survey of unmanned aerial vehicle cooperative control［J］. Journal of Computer Applications， 2021， 41（5）： 1477-1483. 10.11772/j.issn.1001-9081.2020081314
3	黄长强，赵克新，韩邦杰，等. 一种近似动态规划的无人机机动决策方法［J］. 电子与信息学报， 2018， 40（10）： 2447-2452. 10.11999/JEIT180068
	HUANG C Q， ZHAO K X， HAN B J， et al. Maneuvering decision-making method of UAV based on approximate dynamic programming［J］. Journal of Electronics Information Technology， 2018， 40（10）： 2447-2452. 10.11999/JEIT180068
4	李世豪，丁勇，高振龙. 基于直觉模糊博弈的无人机空战机动决策［J］. 系统工程与电子技术， 2019， 41（5）： 1063-1070. 10.3969/j.issn.1001-506X.2019.05.19
	LI S H， DING Y， GAO Z L. UAV air combat maneuvering decision based on intuitionistic fuzzy game theory［J］. Systems Engineering and Electronics， 2019， 41（5）： 1063-1070. 10.3969/j.issn.1001-506X.2019.05.19
5	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［EB/OL］. （2019-07-05）［2023-03-27］..
6	YANG Q， ZHU Y， ZHANG J， et al. UAV air combat autonomous maneuver decision based on DDPG algorithm［C］// Proceedings of the IEEE 15th International Conference on Control and Automation. Piscataway： IEEE， 2019： 37-42. 10.1109/icca.2019.8899703
7	LI Y， HAN W， WANG Y. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system［J］. IEEE Access， 2020， 8： 67887-67898. 10.1109/access.2020.2985576
8	KONG W， ZHOU D， YANG Z， et al. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning［J］. Electronics， 2020， 9（7）： No.1211. 10.3390/electronics9071121
9	LOWE R， WU Y， TAMAR A， et al. Multi-agent actor-critic for mixed cooperative-competitive environments［C］// Proceedings of the 31st Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6382-6393.
10	李波，越凯强，甘志刚，等. 基于MADDPG的多无人机协同任务决策［J］. 宇航学报， 2021， 42（6）： 757-765. 10.3873/j.issn.1000-1328.2021.06.009
	LI B， YUE K Q， GAN Z G， et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient［J］. Journal of Astronautics， 2021， 42（6）： 757-765. 10.3873/j.issn.1000-1328.2021.06.009
11	禹明刚，何明，张东戈，等. 基于多元公共品演化博弈的无人作战集群策略占优条件［J］. 系统工程与电子技术， 2021， 43（9）： 2553-2561. 10.12305/j.issn.1001-506X.2021.09.23
	YU M G， HE M， ZHANG D G， et al. Strategy dominance condition of unmanned combat cluster based on multi-player public goods evolutionary game［J］. Systems Engineering and Electronics， 2021， 43（9）： 2553-2561. 10.12305/j.issn.1001-506X.2021.09.23
12	邢炳楠，杜忠华，杜成鑫. 采用弹道修正技术的红外干扰弹性能优化［J］. 国防科技大学学报， 2022， 44（2）： 141-149. 10.11887/j.cn.202202018
	XING B N， DU Z H， DU C X. Performance optimization of infrared interference decoy based on trajectory correction technique［J］. Journal of National University of Defense Technology， 2022， 44（2）： 141-149. 10.11887/j.cn.202202018
13	黄捷，陈谋，姜长生. 无人机空对地多目标攻击的满意分配决策技术［J］. 电光与控制， 2014， 21（7）： 10-13， 30. 10.3969/j.issn.1671-637X.2014.07.003
	HUANG J， CHEN M， JIANG C S. Satisficing decision-making on task allocation for UAVs in air-to-ground attacking［J］. Electronics Optics and Control， 2014， 21（7）： 10-13， 30. 10.3969/j.issn.1671-637X.2014.07.003
14	GRONAUER S， DIEPOLD K. Multi-agent deep reinforcement learning： a survey［J］. Artificial Intelligence Review， 2022， 55（2）： 895-943. 10.1007/s10462-021-09996-w
15	TAMPUU A， MATIISEN T， KODELJA D， et al. Multiagent cooperation and competition with deep reinforcement learning［J］. PLoS ONE， 2020， 12（4）： No.e0172395.
16	相晓嘉，闫超，王菖，等. 基于深度强化学习的固定翼无人机编队协调控制方法［J］. 航空学报， 2021， 42（4）： No.524009.
	XIANG X J， YAN C， WANG C， et al. Coordination control method for fixed-wing UAV formation through deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： No.524009.

[1]	Yu WANG, Tianjun REN, Zilin FAN. Air combat maneuver decision-making of unmanned aerial vehicle based on guided Minimax-DDQN [J]. Journal of Computer Applications, 2023, 43(8): 2636-2643.
[2]	Zhongyuan ZHANG, Wei DAI, Guangyu LI, Xiaoqing CHEN, Qibo DENG. Cooperative obstacle avoidance algorithm based on improved artificial potential field and consensus protocol [J]. Journal of Computer Applications, 2023, 43(8): 2644-2650.
[3]	Xiaolin LI, Yusang JIANG. Task offloading algorithm for UAV-assisted mobile edge computing [J]. Journal of Computer Applications, 2023, 43(6): 1893-1899.
[4]	Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841.
[5]	Xianlan WANG, Jinkun ZHOU, Nan MU, Chen WANG. Cross-view geo-localization method based on multi-task joint learning [J]. Journal of Computer Applications, 2023, 43(5): 1625-1635.
[6]	Yanan SUN, Jiehong WU, Junling SHI, Lijun GAO. Multi-UAV collaborative task assignment method based on improved self-organizing map [J]. Journal of Computer Applications, 2023, 43(5): 1551-1556.
[7]	Longbao WANG, Yinqi LUAN, Liang XU, Xin ZENG, Shuai ZHANG, Shufang XU. Route planning method of UAV swarm based on dynamic cluster particle swarm optimization [J]. Journal of Computer Applications, 2023, 43(12): 3816-3823.
[8]	Jinghu LI, Qianguo XING, Xiangyang ZHENG, Lin LI, Lili WANG. Noctiluca scintillans red tide extraction method from UAV images based on deep learning [J]. Journal of Computer Applications, 2022, 42(9): 2969-2974.
[9]	Jun MA, Zhen YAO, Cuifeng XU, Shouhong CHEN. Multi-UAV real-time tracking algorithm based on improved PP-YOLO and Deep-SORT [J]. Journal of Computer Applications, 2022, 42(9): 2885-2892.
[10]	Bao CHEN, Zupeng ZHOU, Huan WEI, Yanzhao LYU, Zhicheng SUI. Control method of quadrotor UAV with manipulator based on expert PID [J]. Journal of Computer Applications, 2022, 42(8): 2637-2642.
[11]	Yuanliang XUE, Guodong JIN, Lining TAN, Jiankun XU. Pixel classification-based multiscale UAV aerial object rotational tracking algorithm [J]. Journal of Computer Applications, 2022, 42(7): 2239-2247.
[12]	Kegui GUO, Rui CAO, Neng WAN, Xiao WANG, Yue YIN, Xuming TANG, Junlin XIONG. Image matching algorithm based on transmission tower area extraction [J]. Journal of Computer Applications, 2022, 42(5): 1591-1597.
[13]	Xin LING, Minzheng LI. Hybrid beamforming method with high spectral efficiency for unmanned aerial vehicle patrol system [J]. Journal of Computer Applications, 2022, 42(3): 980-984.
[14]	Jinkun ZHOU, Xianlan WANG, Nan MU, Chen WANG. Unmanned aerial vehicle image localization method based on multi-view and multi-supervision network [J]. Journal of Computer Applications, 2022, 42(10): 3191-3199.
[15]	MA Ziyu, HE Ming, LIU Zujun, GU Lingfeng, LIU Jintao. Survey of unmanned aerial vehicle cooperative control [J]. Journal of Computer Applications, 2021, 41(5): 1477-1483.

UAV cluster cooperative combat decision-making method based on deep reinforcement learning

基于深度强化学习的无人机集群协同作战决策方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 16

Related Articles 15

Recommended Articles

Metrics