系统工程与电子技术 ›› 2023, Vol. 45 ›› Issue (5): 1451-1460.doi: 10.12305/j.issn.1001-506X.2023.05.21

• 制导、导航与控制 • 上一篇    

基于DQN的旋翼无人机着陆控制算法

唐进1,2, 梁彦刚1,2,*, 白志会3, 黎克波1,2   

  1. 1. 国防科技大学空天科学学院, 湖南 长沙 410073
    2. 空天任务智能规划与仿真湖南省重点实验室, 湖南 长沙 410073
    3. 中国人民解放军 31102 部队, 江苏 南京 210000
  • 收稿日期:2022-06-01 出版日期:2023-04-21 发布日期:2023-04-28
  • 通讯作者: 梁彦刚
  • 作者简介:唐进(1997—),男,硕士研究生,主要研究方向为飞行器动力学与控制、深度强化学习
    梁彦刚(1979—),男,教授,博士,主要研究方向为飞行器总体设计与系统仿真、飞行器动力学与控制
    白志会(1996—),男,研究实习员,硕士,主要研究方向为飞行器动力学与控制、深度强化学习
    黎克波(1986—),男,副研究员,博士,主要研究方向为导弹制导与控制

Landing control algorithm of rotor UAV based on DQN

Jin TANG1,2, Yangang LIANG1,2,*, Zhihui BAI3, Kebo LI1,2   

  1. 1. College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
    2. Hunan Key Laboratory of Intelligent Planning and Simulation for Aerospace Mission, Changsha 410073, China
    3. Unit 31102 of the PLA, Nanjing 210000, China
  • Received:2022-06-01 Online:2023-04-21 Published:2023-04-28
  • Contact: Yangang LIANG

摘要:

针对无人机的着陆控制问题,研究了一种基于深度强化学习理论的旋翼无人机着陆控制算法。利用深度强化学习训练生成无人机智能体,根据观测结果给出动作指令,以实现自主着陆控制。首先, 基于随机过程理论,将旋翼无人机的着陆控制问题转化为马尔可夫决策过程。其次, 设计分别考虑无人机横向和纵向控制过程的奖励函数,将着陆控制问题转入强化学习框架。然后, 采用深度Q网络(deep Q network, DQN)算法求解该强化学习问题,通过大量训练得到着陆控制智能体。最后, 通过多种工况下的着陆平台进行大量的数值模拟和仿真分析,验证了算法的有效性。

关键词: 深度强化学习, 马尔可夫决策过程, 深度Q网络算法, 旋翼无人机, 着陆控制

Abstract:

Aiming at the problem of landing control for unmanned aerial vehicle (UAV), a landing control algorithm of rotor UAV based on deep reinforcement learning (DRL) theory is studied. The UAV agent is generated by DRL training, and the action command is given according to the observation results to achieve autonomous landing control. Firstly, based on the random process theory, the landing control problem of rotor UAV is transformed into a Markov decision process (MDP). Secondly, a reward function is designed to consider the horizontal and vertical control processes of UAV respectively, and the landing control problem is transferred to the reinforcement learning framework. Then, the deep Q network (DQN) algorithm is used to solve the reinforcement learning problem, and the landing control agent is obtained through a large number of training. Finally, the effectiveness of the algorithm is verified by a large number of numerical simulations and simulation analysis of landing platforms in various operating conditions.

Key words: deep reinforcement learning (DRL), Markov decision process (MDP), deep Q network (DQN) algorithm, rotor unmanned aerial vehicle, landing control

中图分类号: