系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (5): 1652-1661.doi: 10.12305/j.issn.1001-506X.2022.05.27

• 制导、导航与控制 • 上一篇    下一篇

基于强化学习的全电推进卫星变轨优化方法

韩明仁1,2, 王玉峰1,2,*   

  1. 1. 北京控制工程研究所, 北京 100094
    2. 空间智能控制技术重点实验室, 北京 100094
  • 收稿日期:2021-07-09 出版日期:2022-05-01 发布日期:2022-05-16
  • 通讯作者: 王玉峰
  • 作者简介:韩明仁(1996—), 男, 硕士研究生, 主要研究方向为航天器智能控制|王玉峰(1976—), 男, 研究员, 博士, 主要研究方向为航天器姿态与轨道控制、卫星控制系统设计与集成测试
  • 基金资助:
    国家自然科学基金(11502017)

Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning

Mingren HAN1,2, Yufeng WANG1,2,*   

  1. 1. Beijing Institute of Control Engineering, Beijing 100094, China
    2. Science and Technology on Space Intelligent Control Laboratory, Beijing 100094, China
  • Received:2021-07-09 Online:2022-05-01 Published:2022-05-16
  • Contact: Yufeng WANG

摘要:

采用电推力器实现自主轨道转移是全电推进卫星领域的关键技术之一。针对地球同步轨道(geostationary orbit, GEO)全电推进卫星的轨道提升问题, 将广义优势估计(generalized advantage estimator, GAE)和近端策略优化(proximal policy optimization, PPO)方法相结合, 在考虑多种轨道摄动影响以及地球阴影约束的情况下, 提出了基于强化学习的时间最优小推力变轨策略优化方法。针对状态空间过大、奖励稀疏导致训练困难这一关键问题, 提出了动作输出映射和分层奖励等训练加速方法, 有效提升了训练效率, 加快了收敛速度。数值仿真和结果对比表明, 所提方法更加简单、灵活、高效, 与传统的直接法、间接法以及反馈控制法相比,能够保证轨道转移时间的最优性。

关键词: 全电推进卫星, 小推力变轨优化, 强化学习, 近端策略优化, 训练加速方法

Abstract:

Using electric thrusters for autonomous orbit transfer is one of the critical technologies in the field of all-electric propulsion satellites. In order to solve the orbit raising problem of all-electric propulsion geostationary orbit (GEO) satellites, a reinforcement learning-based optimization method for the time-optimal low-thrust orbit transfer strategy is formulated by combining generalized advantage estimator (GAE) and proximal policy optimization (PPO) methods, taking into account the influence of multiple orbital perturbations and the constraints of the earth's shadow. Aiming at the key problem of training difficulty caused by too large state space and sparse reward, training acceleration methods such as action output mapping and hierarchical reward are proposed, which effectively improve the training efficiency and accelerate the convergence speed. Through numerical simulation and comparison of the results with the direct method, the indirect method and the feedback control method, it shows that the optimization method based on reinforcement learning is more simple, flexible, efficient, and time-optimal in orbit transfer.

Key words: all-electric propulsion satellite, low-thrust orbit transfer optimization, reinforcement learning, proximal policy optimization (PPO), training acceleration method

中图分类号: