系统工程与电子技术 ›› 2023, Vol. 45 ›› Issue (5): 1518-1525.doi: 10.12305/j.issn.1001-506X.2023.05.29

• 通信与网络 • 上一篇    

基于多参数联合逐级离散的快速通信干扰决策方法

叶立诚, 王军, 毛少卿, 刘帅   

  1. 哈尔滨工业大学(威海)信息科学与工程学院, 山东 威海 264209
  • 收稿日期:2022-02-24 出版日期:2023-04-21 发布日期:2023-04-28
  • 通讯作者: 王军
  • 作者简介:叶立诚(1997—), 男, 硕士研究生, 主要研究方向为智能干扰决策、强化学习
    王军(1976—), 男, 副教授, 博士, 主要研究方向为极化阵列信号处理、智能化雷达对抗方法、稀疏信号处理
    毛少卿(1997—), 男, 硕士研究生, 主要研究方向为智能干扰决策、强化学习
    刘帅(1980—), 男, 副教授, 博士, 主要研究方向为共形阵列、极化敏感阵列智能优化极化-DOA参数估计方法、常规阵列及极化敏感阵列鲁棒波束形成算法

Fast communication jamming decision-making method based on multi-parameter joint stepwise discretization

Licheng YE, Jun WANG, Shaoqing MAO, Shuai LIU   

  1. School of Information Science and Engineering, Harbin Institute of Technology (Weihai), Weihai 264209, China
  • Received:2022-02-24 Online:2023-04-21 Published:2023-04-28
  • Contact: Jun WANG

摘要:

在实时未知通信环境中, 干扰机通过自主交互学习尽快找到最优干扰策略是智能干扰对抗的关键。现有基于强化学习的干扰决策优化方法常常需要在大量交互后才能趋近于最优, 而在通信对抗中所需的多参数联合优化问题极大地增加了干扰决策选择空间, 导致现有强化学习类方法难以适用于时间受限的对抗环境。提出了一种逐级离散干扰决策(jamming bandit based on stepwise discretization, JBSD)方法, 通过干扰参数逐级离散方法细化并缩小了多干扰参数选择空间, 通过干扰摇臂剪枝机制对低收益干扰参数进行了消除。数值仿真结果表明, 在时间受限的实时干扰环境中, 方法具有更快的干扰策略寻优速度和更高的平均干扰收益。

关键词: 通信, 干扰, 强化学习, 快速决策, 逐级离散

Abstract:

In the real-time unknown communication environment, it is the key to intelligent jamming that jammers find the optimal jamming strategy as soon as possible through autonomous interactive learning. The existing jamming decision optimization methods based on reinforcement learning often need a large number of interactions to approach the optimization. The multi parameter joint optimization problem required in communication countermeasure greatly increases the selection space of jamming decision, which makes the existing reinforcement learning methods difficult to apply to the time limited countermeasure environment. A jamming bandit based on stepwise discretization (JBSD) method is proposed, which refines and reduces the selection space of multiple jamming parameters through the stepwise discretization method of jamming parameters, and eliminates the low-yield jamming parameters through the jamming arm pruning mechanism. Numerical simulation results show that the proposed method has faster optimization speed of jamming strategy and higher average jamming reward in time limited real-time jamming environment.

Key words: communication, jamming, reinforcement learning, fast decision, stepwise discretization

中图分类号: