系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (6): 1942-1949.doi: 10.12305/j.issn.1001-506X.2022.06.21

• 制导、导航与控制 • 上一篇    下一篇

再入飞行器深度确定性策略梯度制导方法研究

郭冬子1, 黄荣2, 许河川3, 孙立伟3, 崔乃刚1,*   

  1. 1. 哈尔滨工业大学航天学院, 黑龙江 哈尔滨 150001
    2. 北京控制与电子工程研究所, 北京 100038
    3. 中国兵器工业集团航空弹药研究院, 黑龙江 哈尔滨 150030
  • 收稿日期:2021-06-02 出版日期:2022-05-30 发布日期:2022-05-30
  • 通讯作者: 崔乃刚
  • 作者简介:郭冬子 (1976—), 男, 研究员, 博士, 主要研究方向为飞行器设计|黄荣 (1986—), 男, 工程师, 博士, 主要研究方向为飞行器轨迹优化|许河川 (1979—), 男, 研究员, 硕士, 主要研究方向为飞行器控制|孙立伟 (1982—), 男, 高级工程师, 硕士, 主要研究方向为导弹制导控制系统|崔乃刚 (1965—), 男, 教授, 博士研究生导师, 博士, 主要研究方向为飞行力学与控制、滤波理论与应用
  • 基金资助:
    微小型航天器技术实验室开放基金(HIT.KLOF.MST.2018028)

Research on deep deterministic policy gradient guidance method for reentry vehicle

Dongzi GUO1, Rong HUANG2, Hechuan XU3, Liwei SUN3, Naigang CUI1,*   

  1. 1. School of Astronautics, Harbin Institute of Technology, Harbin 150001, China
    2. Beijing Institute of Control and Electronic Technology, Beijing 100038, China
    3. Aviation Ammunition Research Institute of China Ordnance Industry Group, Harbin 150030, China
  • Received:2021-06-02 Online:2022-05-30 Published:2022-05-30
  • Contact: Naigang CUI

摘要:

为解决传统再入飞行器轨迹制导方法对强扰动条件适应性不足, 难以满足终端约束的问题, 在深度确定性策略梯度学习框架基础上, 通过对随机强扰动条件下的离线飞行轨迹进行网络训练, 寻找不同环境影响条件下的最优动作网络, 以用于在线干扰条件下的制导轨迹规划, 可通过对再入飞行攻角和倾侧角剖面的周期性预测, 满足再入飞行终端高度、航程和速度约束。仿真实验结果表明: 在满足终端高度约束的条件下, 最大终端剩余航程偏差小于500 m, 最大终端速度偏差小于35 m/s。本文所提制导方法较传统跟踪制导方法有较大的精度提升, 算法计算量小, 具有较好的工程应用前景。

关键词: 再入飞行器, 强化学习, 深度确定性策略梯度, 制导

Abstract:

In order to solve the problem that the traditional reentry vehicle trajectory guidance methods are not adaptable to the strong disturbance conditions and difficult to meet the terminal constraints. Based on the framework of deep deterministic policy gradient (DDPG) reinforcement learning method, conducts network training on the off-line flight trajectory under the random strong disturbance conditions to find the optimal actor network under different environmental conditions. It can be used for guidance trajectory planning under the condition of on-line interference to meet the terminal altitude, range and speed constraints of reentry flight by periodically forecasting the angle of attack and pitch profile of reentry flight. The simulation results show that the maximum terminal residual range deviation is less than 500 m and the maximum terminal speed deviation is less than 35 m/s while meeting the terminal height constraint. Compared with the traditional tracking guidance method, the guidance control method proposed in this paper has higher accuracy and less calculation, which has a good engineering application prospect.

Key words: reentry vehicle, reinforcement learning, deep deterministic policy gradient, guidance

中图分类号: