系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (10): 3506-3518.doi: 10.12305/j.issn.1001-506X.2024.10.27

• 制导、导航与控制 • 上一篇    

融合动态奖励策略的无人机编队路径规划方法

唐恒1, 孙伟1,*, 吕磊1, 贺若飞2, 吴建军3, 孙昌浩4, 孙田野1   

  1. 1. 西安电子科技大学空间科学与技术学院, 陕西 西安 710118
    2. 西北工业大学第365研究所, 陕西 西安 710072
    3. 西安爱生无人机技术有限公司, 陕西 西安 710065
    4. 中国空间技术研究院钱学森空间技术实验室, 北京 100094
  • 收稿日期:2023-11-01 出版日期:2024-09-25 发布日期:2024-10-22
  • 通讯作者: 孙伟
  • 作者简介:唐恒 (1998—), 男, 硕士研究生, 主要研究方向为强化学习、无人机编队路径规划
    孙伟 (1980—), 男, 教授, 博士, 主要研究方向为开放环境中不确定条件下的感知与行为的机器理解、复杂任务规划与推理
    吕磊 (1995—), 男, 博士研究生, 主要研究方向为多无人机协同控制、航迹规划
    贺若飞 (1982—), 男, 副研究员, 博士, 主要研究方向为无人机系统工程与总体设计、智能无人机协同控制
    吴建军 (1972—), 男, 副研究员, 博士, 主要研究方向为无人机系统飞控及总体设计
    孙昌浩 (1987—), 男, 高级工程师, 博士, 主要研究方向为博弈学习、分布式协同决策理论与应用
    孙田野 (1995—), 男, 博士研究生, 主要研究方向为多无人机系统与无人机路径规划
  • 基金资助:
    中国高校产学研创新基金(2021ZYA08004);西安市科技计划(2022JH-RGZN-0039);陕西省重点研发计划重点产业创新链项目(2022ZDLGY03-01);国家自然科学基金(62173330)

UAV formation path planning approach incorporating dynamic reward strategy

Heng TANG1, Wei SUN1,*, Lei LYU1, Ruofei HE2, Jianjun WU3, Changhao SUN4, Tianye SUN1   

  1. 1. School of Aerospace Science and Technology, Xidian University, Xi'an 710118, China
    2. The 365th Research Institute, Northwestern Polytechnical University, Xi'an 710072, China
    3. Xi'an ASN UAV Technology Co. Ltd, Xi'an 710065, China
    4. Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing 100094, China
  • Received:2023-11-01 Online:2024-09-25 Published:2024-10-22
  • Contact: Wei SUN

摘要:

针对未知动态环境下无人机(unmanned aerial vehicle, UAV)编队路径规划问题, 提出融合动态编队奖励函数的多智能体双延迟深度确定性策略梯度(multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function, MATD3-IDFRF)算法的UAV编队智能决策方案。首先, 针对无障碍物环境, 拓展稀疏性奖励函数。然后, 深入分析UAV编队路径规划中重点关注的动态编队问题, 即UAV编队以稳定的结构飞行并根据周围环境微调队形, 其本质为每两架UAV间距保持相对稳定, 同时也依据外界环境而微调。为此, 设计基于每两台UAV之间最佳间距和当前间距的奖励函数, 在此基础上提出动态编队奖励函数, 并结合多智能体双延迟深度确定性(multi-agent twin delayed deep deterministic, MATD3)算法提出MATD3-IDFRF算法。最后, 设计对比实验, 在复合障碍物环境中, 所提动态编队奖励函数能将算法成功率提升6.8%, 将收敛后的奖励平均值提升2.3%, 将编队变形率降低97%。

关键词: 强化学习, 奖励函数, 无人机, 动态编队, 路径规划

Abstract:

For the unmanned aerial vehicle (UAV) formation path planning problem in unknown dynamic environment, an intelligent decision scheme for UAV formation based on multi-agent twin delayed deep deterministic strategy gradient algorithm incorporating dynamic formation reward function (MATD3-IDFRF) algorithm is proposed. Firstly, the sparsity reward function is extended for the obstacle-free environment. Then, the dynamic formation problem, which is the focus of attention in UAV formation path planning, is analyzed in depth. It is described as a UAV formation flying in a stable formation structure and a fine-tuning of the formation in time according to the surrounding environment. The essence of the analysis is that the spacing between each two UAVs remains relatively stable, while it is also fine-tuned by the external environment. A reward function based on the optimal distance and current distance between each pair of UAVs is designed, leading to the proposal of a dynamic formation reward function, and which is then combined with the multi-agent twin delayed deep deterministic (MATD3) algorithm to propose the MATD3-IDFRF algorithm. Finally, comparison experiments are designed, and the dynamic formation reward function presented in this paper can improve the algorithm success rate by 6.8%, while improving the converged reward average by 2.3% and reducing the formation deformation rate by 97% in the complex obstacle environment.

Key words: reinforcement learning (RL), reward function, unmanned aerial vehicle (UAV), dynamic formation, path planning

中图分类号: