系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (11): 3764-3773.doi: 10.12305/j.issn.1001-506X.2024.11.18

• 系统工程 • 上一篇    下一篇

基于PPO算法的集群多目标火力规划方法

秦湖程, 黄炎焱, 陈天德, 张寒   

  1. 南京理工大学自动化学院, 江苏 南京 210094
  • 收稿日期:2023-07-17 出版日期:2024-10-28 发布日期:2024-11-30
  • 通讯作者: 黄炎焱
  • 作者简介:秦湖程(1996—), 男, 博士研究生, 主要研究方向为智能规划、决策控制及优化
    黄炎焱(1973—), 男, 教授, 博士, 主要研究方向为装备系统论证与系统效能分析、作战效能评估、兵棋推演技术、指挥控制信息系统、应急管理、系统建模与仿真
    陈天德(1994—), 男, 博士研究生, 主要研究方向为智能规划、决策控制及优化
    张寒(1994—), 男, 博士研究生, 主要研究方向为指挥控制、协同决策、应急服务
  • 基金资助:
    中船创新基金(KJB2023012)

Cluster multi-target fire planning method based on PPO algorithm

Hucheng QIN, Yanyan HUANG, Tiande CHEN, Han ZHANG   

  1. School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2023-07-17 Online:2024-10-28 Published:2024-11-30
  • Contact: Yanyan HUANG

摘要:

针对高动态战场态势下防御作战场景中的多目标火力规划问题, 提出一种基于近端策略优化算法的火力规划方法, 以最大化作战效能为目标, 从弹药消耗、作战效果、作战成本及作战时间4个方面设计强化学习奖励函数。考虑历史决策序列对当前规划的影响, 以长短期记忆网络(long short-term memory, LSTM)为核心, 基于Actor-Critic框架设计神经网络, 使用近端策略优化算法训练网络, 利用训练好的强化学习智能体进行序贯决策, 根据多个决策阶段的态势实时生成一系列连贯火力规划方案。仿真结果表明, 智能体能够实现高动态态势下多目标火力规划, 其计算效率相对于其他算法具有更明显的优势。

关键词: 多目标火力规划, 近端策略优化算法, 长短期记忆网络, 序贯决策

Abstract:

To solve the problem of multi-target firepower planning in defensive combat scenarios under high dynamic battlefield situation, a firepower planning method based on the proximal strategy optimization algorithm is proposed. With the goal of maximizing combat effectiveness, the reinforcement learning reward function is designed from four aspects: ammunition consumption, combat effect, combat cost and combat time. Considering the influence of historical decision sequence on the current planning, the neural network is designed based on the Actor-Critic framework with the long short-term memory network (LSTM) as the core. The network is trained by the proximal strategy optimization algorithm, and the trained reinforcement learning agent is used for sequential decision-making. A series of coherent fire planning schemes are generated in real time according to the situation of multiple decision-making stages. Simulation results show that the agent can realize multi-target firepower planning under high dynamic situation, and its computational efficiency has more obvious advantages than other algorithms.

Key words: multi-target firepower planning, proximal strategy optimization algorithm, long short-term memory network (LSTM), sequential decision-making

中图分类号: