系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (9): 3060-3069.doi: 10.12305/j.issn.1001-506X.2024.09.18

• 系统工程 • 上一篇    

基于深度强化学习的航天器功率-信号复合网络优化算法

张庭瑜1,2, 曾颖1,2,*, 李楠3, 黄洪钟1,2   

  1. 1. 电子科技大学机械与电气工程学院, 四川 成都 611731
    2. 电子科技大学系统可靠性与安全性研究中心, 四川 成都 611731
    3. 中国电子科技集团公司第三研究所, 北京 100016
  • 收稿日期:2023-10-09 出版日期:2024-08-30 发布日期:2024-09-12
  • 通讯作者: 曾颖
  • 作者简介:张庭瑜 (1993—), 男, 博士研究生, 主要研究方向为电子器件可靠性分析、电源系统可靠性优化设计、系统可靠性
    曾颖 (1994—), 男, 博士, 讲师, 主要研究方向为电子产品可靠性建模、剩余寿命预测
    李楠 (1981—), 男, 高级工程师, 博士, 主要研究方向为图像处理、光电设计
    黄洪钟 (1963—), 男, 教授, 博士研究生导师, 博士, 主要研究方向为可靠性设计及智能优化、故障预测与健康管理、人工智能与机器人技术、数字化设计与智能制造
  • 基金资助:
    中央高校基本科研业务费项目(ZYGX2020ZB023)

Spacecraft power-signal composite network optimization algorithm based on DRL

Tingyu ZHANG1,2, Ying ZENG1,2,*, Nan LI3, Hongzhong HUANG1,2   

  1. 1. School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
    2. Center for System Reliability and Safety, University of Electronic Science and Technology of China, Chengdu 611731, China
    3. The 3rd Research Institute of China Electronics Technology Group Corporation, Beijing 100016, China
  • Received:2023-10-09 Online:2024-08-30 Published:2024-09-12
  • Contact: Ying ZENG

摘要:

为了实现航天器电源系统的灵活高效并网, 最大化有限能量的利用, 提出一种基于深度强化学习(deep reinforcement learning, DRL) 的功率传输与信号传输复合网络拓扑优化模型, 并使用知识蒸馏原理的多种可解释组件模型对优化过程进行剖析。首先, 分析在轨运行阶段航天器母线电压调节控制域变换规律, 并结合节点传播性参数, 建立功率传输与信号通信的复合网络拓扑模型。然后, 利用A3C (asynchronous advantage actor-critic) 算法, 对信号传输网络路由分布、拓扑结构等方面潜在的运行可靠性风险进行自适应性优化。最后, 结合多种可解释组件对已训练的DRL模型进行知识蒸馏, 形成一种可解释的量化分析方法。所提方法可以指导空间电源在随机阴影影响下选择最佳并网方案, 并为更高任务要求和复杂环境下空间电源控制器设计提供理论支持。

关键词: 空间电源系统, 复杂网络, 深度强化学习, 可靠性优化, 可解释性分析

Abstract:

To maximize the utilization of limited energy and achieve flexible and efficient grid connection for spacecraft power supply systems, a composite grid topology optimization model for power transmission and signal communication is proposed based on deep reinforcement learning (DRL). Various interpretable component models are employed based on knowledge distillation principles to analyze the optimization mechanism. Firstly, the transformation law of the control domain of the spacecraft bus voltage regulation in the on-orbit operation stage is analyzed, and the composite network topology model of power transmission and signal communication is established by combining the node propagation parameters. Secondly, asynchronous advantage actor-critic (A3C) is utilized to adaptively optimize potential operational reliability risks in routing distribution and topology of the electrical signal transmission network. Finally, various interpretable components are used to perform knowledge distillation on the trained DRL model, forming an interpretable quantitative analysis method. The proposed method theoretically predicts optimal grid-connected processes of space power supply under random shadow effects, providing theoretical support and reference for designing space power supply controllers under higher task requirements and complex environments.

Key words: space power system, complex network theory, deep reinforcement learning (DRL), reliability optimization, interpretable analysis

中图分类号: