系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (2): 443-451.doi: 10.12305/j.issn.1001-506X.2021.02.19

• 系统工程 • 上一篇    下一篇

基于深度随机博弈的近距空战机动决策

马文1(), 李辉1,2(), 王壮1(), 黄志勇1(), 吴昭欣2(), 陈希亮3()   

  1. 1. 四川大学计算机学院, 四川 成都 610065
    2. 四川大学视觉合成图形图像技术国家级重点实验室, 四川 成都 610065
    3. 陆军工程大学指挥控制工程学院, 江苏 南京 210007
  • 收稿日期:2020-03-06 出版日期:2021-02-01 发布日期:2021-03-16
  • 作者简介:马文(1997-),女,硕士研究生,主要研究方向为深度强化学习技术。E-mail:1262578027@qq.com|李辉(1970-),男,教授,博士,主要研究方向为智能计算、战场仿真、虚拟现实。E-mail:lihuib@scu.edu.cn|王壮(1987-),男,博士研究生,主要研究方向为军事人工智能、深度强化学习技术。E-mail:zhuang_wang@qq.com|黄志勇(1995-),男,硕士研究生,主要研究方向为深度强化学习技术。E-mail:771048263@qq.com|吴昭欣(1996-),男,硕士研究生,主要研究方向为战场仿真、深度强化学习技术。E-mail:597779499@qq.com|陈希亮(1985-),男,副教授,博士,主要研究方向为深度强化学习、指挥信息系统工程。E-mail:383618393@qq.com
  • 基金资助:
    全军装备预研项目(31505550302)

Close air combat maneuver decision based on deep stochastic game

Wen MA1(), Hui LI1,2(), Zhuang WANG1(), Zhiyong HUANG1(), Zhaoxin WU2(), Xiliang CHEN3()   

  1. 1. College of Computer Science, Sichuan University, Chengdu 610065, China
    2. National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China
    3. College of Command and Control Engineering, Army Engineering University, Nanjing 210007, China
  • Received:2020-03-06 Online:2021-02-01 Published:2021-03-16

摘要:

针对空战中作战信息复杂、难以快速准确地感知态势做出决策的问题,提出一种博弈论与深度强化学习相结合的算法。首先,依据一对一典型空战流程,以随机博弈为标准,构建近距空战中红蓝双方对抗条件下的双机多状态博弈模型。其次,利用深度Q网络(deep Q network, DQN)处理战机的连续无限状态空间。然后,使用Minimax算法构建线性规划来求解每个特定状态下阶段博弈的最优值函数,并训练网络逼近值函数。最后,训练完成后根据网络输出求得最优机动策略。空战仿真实验表明,该算法具有较好的适应性和智能性,能够有效地针对空战对手的行动策略实时选择有利的机动动作并占据优势地位。

关键词: 博弈论, 深度强化学习, 随机博弈, 空战决策

Abstract:

In order to solve the problem of complex combat information and difficult to quickly and accurately perceive situation and make decision in air combat, an algorithm combining game theory and deep reinforcement learning is proposed. Firstly, according to the typical one-to-one air combat process and the standard of random game, a two machine multi-state game model under the condition of red and blue confrontation in close air combat is constructed. Secondly, deep Q network (DQN) is used to deal with the continuous infinite state space of fighter. Then, the Minimax algorithm is used to construct a linear programming to solve the optimal value function of the stage game in each specific state, and the network approximation value function is trained. Finally, the optimal maneuver strategy is obtained according to the output of the network after training. The simulation results show that the algorithm has good adaptability and intelligence for the air combat. It can effectively select the favorable maneuver action and occupy the dominant position according to the air combat opponent's action strategy.

Key words: game theory, deep reinforcement learning, stochastic game, air combat strategy

中图分类号: