系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (12): 3685-3695.doi: 10.12305/j.issn.1001-506X.2022.12.12

• 传感器与信号处理 • 上一篇    下一篇

基于先验知识的多功能雷达智能干扰决策方法

朱霸坤1,2,*, 朱卫纲1, 李伟3, 杨莹3, 高天昊3   

  1. 1. 航天工程大学电子光学工程系, 北京 101416
    2. 电子信息系统复杂电磁环境效应国家重点实验室, 河南 洛阳 471032
    3. 航天工程大学研究生院, 北京 101416
  • 收稿日期:2021-07-16 出版日期:2022-11-14 发布日期:2022-11-24
  • 通讯作者: 朱霸坤
  • 作者简介:朱霸坤(1997—), 男, 硕士研究生, 主要研究方向为认知电子战、雷达对抗|朱卫纲(1973—), 女, 教授, 博士, 主要研究方向为现代信号处理、空间信息对抗、认知电子战|李伟(1994—), 男, 硕士研究生, 主要研究方向为雷达辐射源识别|杨莹(1997—), 女, 硕士研究生, 主要研究方向为雷达信号处理|高天昊(1997—), 男, 硕士研究生, 主要研究方向为雷达辐射源识别
  • 基金资助:
    电子信息系统复杂电磁环境效应国家重点实验室项目(CEMEE2020Z0203B)

Multi-function radar intelligent jamming decision method based on prior knowledge

Bakun ZHU1,2,*, Weigang ZHU1, Wei LI3, Ying YANG3, Tianhao GAO3   

  1. 1. Department of Electronic and Optical Engineering, Space Engineering University, Beijing 101416, China
    2. State Key Laboratory of Complex Electromagnetic Environment E f f ects on Electronics and Information System, Luoyang 471032, China
    3. Campany of Postgraduate Management, Space Engineering University, Beijing 101416, China
  • Received:2021-07-16 Online:2022-11-14 Published:2022-11-24
  • Contact: Bakun ZHU

摘要:

针对基于强化学习的多功能雷达干扰决策方法训练周期长、收敛慢的问题,本文提出了基于先验知识的多功能雷达智能干扰决策算法。所提算法使用了基于势能函数的收益塑造理论,利用先验知识设置收益函数,相比于传统算法,具有更快的收敛速率。利用先验知识加速算法收敛速率的方法对强化学习在多功能雷达干扰决策中的实际应用具有重要的意义,对于强化学习在其他领域的应用也具有很好的参考价值。

关键词: 雷达对抗, 马尔可夫决策过程, 强化学习, 收益塑造, 先验知识

Abstract:

In view of the problems of long training period and slow convergence of multi-function radar jamming decision method based on reinforcement learning, this paper proposes a multi-function radar intelligent jamming decision algorithm based on prior knowledge. The proposed algorithm uses the revenue shaping theory based on potential function, and uses prior knowledge to set the revenue function. Compared with the traditional algorithm, the algorithm has faster convergence rate. The method of accelerating the convergence rate of algorithm by using prior knowledge is of great significance for the practical application of reinforcement learning in multi-function radar jamming decision, and also has a good reference value for the application of reinforcement learning in other fields.

Key words: radar confrontation, Markov decision process (MDP), reinforcement learning, reward shaping, prior knowledge

中图分类号: