《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2636-2643.DOI: 10.11772/j.issn.1001-9081.2022071069

• 前沿与综合应用 • 上一篇    

基于引导Minimax-DDQN的无人机空战机动决策

王昱(), 任田君, 范子琳   

  1. 沈阳航空航天大学 自动化学院,沈阳 110136
  • 收稿日期:2022-07-23 修回日期:2022-11-03 接受日期:2022-11-07 发布日期:2023-01-15 出版日期:2023-08-10
  • 通讯作者: 王昱
  • 作者简介:任田君(1995—),男,山西运城人,硕士研究生,主要研究方向:智能决策
    范子琳(1998—),女,辽宁锦州人,硕士研究生,主要研究方向:机器推理。
  • 基金资助:
    国家自然科学基金资助项目(61906125);辽宁省教育厅科学研究经费资助项目(LJKZ0222)

Air combat maneuver decision-making of unmanned aerial vehicle based on guided Minimax-DDQN

Yu WANG(), Tianjun REN, Zilin FAN   

  1. School of Automation,Shenyang Aerospace University,Shenyang Liaoning 110136,China
  • Received:2022-07-23 Revised:2022-11-03 Accepted:2022-11-07 Online:2023-01-15 Published:2023-08-10
  • Contact: Yu WANG
  • About author:REN Tianjun, born in 1995, M. S. candidate. His research interests include intelligent decision-making.
    FAN Zilin, born in 1998, M. S. candidate. Her research interests include machine reasoning.
  • Supported by:
    National Natural Science Foundation of China(61906125);Scientific Research Funding Project of Department of Education of Liaoning Province(LJKZ0222)

摘要:

针对无人机(UAV)空战环境信息复杂、对抗性强所导致的敌机机动策略难以预测,以及作战胜率不高的问题,设计了一种引导Minimax-DDQN(Minimax-Double Deep Q-Network)算法。首先,在Minimax决策方法的基础上提出了一种引导式策略探索机制;然后,结合引导Minimax策略,以提升Q网络更新效率为出发点设计了一种DDQN(Double Deep Q-Network)算法;最后,提出进阶式三阶段的网络训练方法,通过不同决策模型间的对抗训练,获取更为优化的决策模型。实验结果表明,相较于Minimax-DQN(Minimax-DQN)、Minimax-DDQN等算法,所提算法追击直线目标的成功率提升了14%~60%,并且与DDQN算法的对抗胜率不低于60%。可见,与DDQN、Minimax-DDQN等算法相比,所提算法在高对抗的作战环境中具有更强的决策能力,适应性更好。

关键词: 无人机空战, 自主决策, 深度强化学习, 双重深度Q网络, 多阶段训练

Abstract:

A guided Minimax-DDQN (Minimax-Double Deep Q-Network) algorithm was designed to solve the problems of unpredictable enemy aircraft maneuver strategy and low winning rate, which are caused by the complex environment information and strong confrontation of Unmanned Aerial Vehicle (UAV) in air combat. Firstly, on the basis of Minimax decision-making method, a guided strategy exploration mechanism was proposed. Then, combined with the guided Minimax strategy, a type of DDQN (Double Deep Q-Network) algorithm was designed to improve the update efficiency of Q-network. Finally, an advanced three-stage network training method was proposed. And through confrontation training between different decision models, better optimized decision model was obtained. Experimental results show that compared with Minimax-DQN (Minimax-DQN), Minimax-DDQN and other algorithms, the proposed algorithm has the success rate of chasing straight target improved by 14% to 60% and the winning rate against DDQN algorithm over 60%. It can be seen that compared with algorithms such as DDQN and Minimax-DDQN, the proposed algorithm has stronger decision-making capability and better adaptability in high confrontation combat environment.

Key words: Unmanned Aerial Vehicle (UAV) air combat, autonomous decision-making, deep reinforcement learning, Double Deep Q-Network (DDQN), multi-stage training

中图分类号: