系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (1): 199-208.doi: 10.12305/j.issn.1001-506X.2022.01.25

• 系统工程 • 上一篇    下一篇

基于强化学习的战时保障力量调度策略研究

曾斌1, 王睿2,*, 李厚朴3, 樊旭1   

  1. 1. 海军工程大学管理工程与装备经济系, 湖北 武汉 430033
    2. 海军工程大学教研保障中心, 湖北 武汉 430033
    3. 海军工程大学导航工程系, 湖北 武汉 430033
  • 收稿日期:2020-11-28 出版日期:2022-01-01 发布日期:2022-01-19
  • 通讯作者: 王睿
  • 作者简介:曾斌(1970—), 男, 教授, 博士, 主要研究方向为信息管理|王睿(1975—), 女, 馆员, 硕士, 主要研究方向为信息管理|李厚朴(1985—), 男, 副教授, 博士, 主要研究方向为计算机代数分析|樊旭(1989—), 男, 工程师硕士, 主要研究方向为信息管理
  • 基金资助:
    国家自然科学基金(41771487);湖北省杰出青年科学基金(2019CFA086)

Scheduling strategies research based on reinforcement learning for wartime support force

Bin ZENG1, Rui WANG2,*, Houpu LI3, Xu FAN1   

  1. 1. Department of Management and Economics, Naval University of Engineering, Wuhan 430033, China
    2. Teaching and Research Support Center, Naval University of Engineering, Wuhan 430033, China
    3. Department of Navigation Engineering, Naval University of Engineering, Wuhan 430033, China
  • Received:2020-11-28 Online:2022-01-01 Published:2022-01-19
  • Contact: Rui WANG

摘要:

智能化后装保障调度是当前军事领域的研究热点之一, 其中复杂多变的战场环境要求战时保障具有良好的自适应性。针对此问题, 提出了基于马尔可夫决策过程的强化学习模型, 能够主动学习最佳派遣策略, 根据历史数据和当前态势预判后续变化。为了考虑不确定事件的影响, 在模型求解算法中增加了基于概率统计模型的仿真流程; 为了减少随机事件带来的计算复杂性, 利用决策后状态变量重新设计了贝尔曼迭代方程; 为了解决状态空间的维度灾问题, 提出了基于基函数组合的近似函数。仿真实验表明,强化学习能力的引入能够显著提高战时保障调度性能。

关键词: 战时保障, 强化学习, 不确定性, 优化调度

Abstract:

Intelligent logistics and equipment support is one of the research hotspots in the current military field, it is necessary for the wartime support to be adaptive in the complicated and changeable battlefield. Aiming at this problem, a reinforcement learning model based on Markov decision process (MDP) is proposed, which can adaptively learn the optimal assignment policy and obtain the scheduling scheme according to historical data and prediction based on current situation. A simulation procedure based on probability statistical model is adopted into the model solution to consider the impact of the uncertainty events. Furthermore, the post decision state is used in the design of Bellman iterative equation to decrease the computation complexity brought by the random incidents. Finally, the approximate function based on composition of basis functions is proposed to overcome the problem of dimensionality curse. Simulation experiment shows that the reinforcement learning capability can significantly improve the scheduling performance of support force.

Key words: wartime support, reinforcement learning, uncertainty, optimal scheduling

中图分类号: