系统工程与电子技术 ›› 2024, Vol. 46 ›› Issue (7): 2310-2322.doi: 10.12305/j.issn.1001-506X.2024.07.15

• 系统工程 • 上一篇    

融合三支多属性决策与SAC的兵棋推演智能决策技术

彭莉莎1,2, 孙宇祥1,*, 薛宇凡1, 周献中1,3   

  1. 1. 南京大学工程管理学院, 江苏 南京 210008
    2. 浙江财经大学信息技术与人工智能学院, 浙江 杭州 310018
    3. 南京大学智能装备新技术研究中心, 江苏 南京 210008
  • 收稿日期:2023-08-28 出版日期:2024-06-28 发布日期:2024-07-02
  • 通讯作者: 孙宇祥
  • 作者简介:彭莉莎(1994―), 女, 博士, 讲师, 主要研究方向为智能信息处理与智能决策、三支决策
    孙宇祥(1990―), 男, 助理研究员, 博士, 主要研究方向为智能博弈与决策
    薛宇凡(1998―), 男, 硕士, 主要研究方向为智能兵棋推演
    周献中(1962―), 男, 教授, 博士, 主要研究方向为C2系统理论与技术、智能信息处理、智能人机交互
  • 基金资助:
    国家自然科学青年基金(62306135);教育部青年基金(23YJC630156);江苏省青年基金(BK20230783);南京大学技术创新基金(SC-2023-039)

Intelligent decision-making technology for wargame by integrating three-way multiple attribute decision-making and SAC

Lisha PENG1,2, Yuxiang SUN1,*, Yufan XUE1, Xianzhong ZHOU1,3   

  1. 1. School of Engineering Management, Nanjing University, Nanjing 210008, China
    2. School of Information Technology & Artificial Intelligence, Zhejiang University of Finance & Economics, Hangzhou 310018, China
    3. Research Center for New Technology in Intelligent Equipment, Nanjing University, Nanjing 210008, China
  • Received:2023-08-28 Online:2024-06-28 Published:2024-07-02
  • Contact: Yuxiang SUN

摘要:

近年来, 将深度强化学习技术用于兵棋推演的智能对抗策略生成受到广泛关注。针对强化学习决策模型采样率低、训练收敛慢以及智能体博弈胜率低的问题, 提出一种融合三支多属性决策(three-way multiple attribute decision making, TWMADM)与强化学习的智能决策技术。基于经典软表演者-批评家(soft actor-critic, SAC)算法开发兵棋智能体, 利用TWMADM方法评估对方算子的威胁情况, 并将该威胁评估结果以先验知识的形式引入到SAC算法中规划战术决策。在典型兵棋推演系统中开展博弈对抗实验, 结果显示所提算法可有效加快训练收敛速度, 提升智能体的对抗策略生成效率和博弈胜率。

关键词: 兵棋推演, 三支多属性决策, 软表演者-批评家, 强化学习, 智能决策

Abstract:

In recent years, the generation of intelligent confrontation strategies using deep reinforcement learning technology for wargaming has attracted widespread attention. Aiming at the problems of low sampling rate, slow training convergence of reinforcement learning decision model and low game winning rate of agents, an intelligent decision-making technology integrating three-way multiple attribute decision making (TWMADM) and reinforcement learning is proposed. Based on the classical soft actor-critic (SAC) algorithm, the wargaming agent is developed, and the threat situation of the opposing operator is evaluated by using TWMADM method, and the threat assessment results are introduced into the SAC algorithm in the form of prior knowledge to plan tactical decisions. A game confrontation experiment is conducted in a typical wargame system, and the results shows that the proposed algorithm can effectively speed up the training convergence, improve the efficiency of generating adversarial strategies and the game winning rate for agents.

Key words: wargame, three-way multiple attribute decision making (TWMADM), soft actor-critic (SAC), reinforcement learning (RL), intelligent decision

中图分类号: