Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用

doi:10.12305/j.issn.1001-506X.2021.03.20

系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (3): 755-762.doi: 10.12305/j.issn.1001-506X.2021.03.20

Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用

李琛¹(), 黄炎焱^1,*(), 张永亮²(), 陈天德¹()

1. 南京理工大学自动化学院, 江苏南京 210094
2. 陆军工程大学指挥控制工程学院, 江苏南京 210007

收稿日期:2020-05-06 出版日期:2021-03-01 发布日期:2021-03-16
通讯作者: 黄炎焱 E-mail:1120544671@qq.com;huangyy@njust.edu.cn;zhangylnj@qq.com;369253482@qq.com
作者简介:李琛(1995-), 男, 硕士研究生, 主要研究方向为系统建模与仿真。E-mail:1120544671@qq.com|张永亮(1982-), 男, 副教授, 博士, 主要研究方向为指挥理论与仿真、作战任务智能规划。E-mail:zhangylnj@qq.com|陈天德(1994-), 男, 博士研究生, 主要研究方向为仿真建模与指挥决策。E-mail:369253482@qq.com
基金资助:
国家自然科学基金(61374186);2018年装备预研领域基金(61403120205)

Multi-agent decision-making method based on Actor-Critic framework and its application in wargame

Chen LI¹(), Yanyan HUANG^1,*(), Yongliang ZHANG²(), Tiande CHEN¹()

1. School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
2. Command and Control Engineering College, Army Engineering University, Nanjing 210007, China

Received:2020-05-06 Online:2021-03-01 Published:2021-03-16
Contact: Yanyan HUANG E-mail:1120544671@qq.com;huangyy@njust.edu.cn;zhangylnj@qq.com;369253482@qq.com

摘要/Abstract

摘要：

将人工智能应用于兵棋推演的智能战术兵棋正逐年发展, 基于Actor-Critic框架的决策方法可以实现智能战术兵棋的战术行动动态决策。但若Critic网络只对单算子进行评价, 多算子之间的网络没有协同, 本方算子之间各自行动决策会不够智能。针对上述方法的不足, 提出了一种基于强化学习并结合规则的多智能体决策方法, 以提升兵棋推演的智能水平。侧重采用强化学习对多算子的行动决策进行决策分析, 并结合产生式规则对战术决策进行规划。构建基于Actor-Critic框架的多算子分布执行集中训练的行动决策模型, 对比每个算子互不交流的封闭式行动决策学习方法, 提出的分布执行集中训练方法更具优势且有效。

关键词: 智能战术, 兵棋推演, 多智能体强化学习, Actor-Critic框架, 分布执行集中训练

Abstract:

The intelligent tactical wargame which applies artificial intelligence to wargame deduction is developed year by year. The decision-making method based on Actor-Critic framework can realize the dynamic decision-making of tactical action of intelligent tactical wargame. However, if the Critic network only evaluates the single agent, and there is no cooperation among multiple agents, the decision-making of each agent will not be intelligent enough. In order to improve the intelligence level of wargame deduction, a multi-agent decision-making method based on reinforcement learning and rules is proposed. The decision analysis of the multi-agent action decision by using reinforcement learning is focuses, and combining with the production rules to plan tactical decision. An action decision model based on Actor-Critic framework for multi-agent distributed execution training is constructed. Compared with the closed action decision-making learning method in which each operator does not communicate with each other, the proposed distributed execution and centralized training method is more advantageous and effective.

Key words: intelligent tactics, wargame, multi-agent reinforcement learning, Actor-Critic framework, distributed execution and centralized training

中图分类号:

TP181

李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用[J]. 系统工程与电子技术, 2021, 43(3): 755-762.

Chen LI, Yanyan HUANG, Yongliang ZHANG, Tiande CHEN. Multi-agent decision-making method based on Actor-Critic framework and its application in wargame[J]. Systems Engineering and Electronics, 2021, 43(3): 755-762.

图/表 17

图1

图2

图3

表1

图4

图5

图6

图7

图8

表2

图9

图10

图11

图12

图13

图14

图15

参考文献 30

1	SILVER D , HUBERT T , SCHRITTWIESER J , et al. A gene-ral reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362 (6419): 1140- 1144. doi: 10.1126/science.aar6404
2	SILVER D , HUANG A , MADDISON C , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7589): 484- 489.
3	LEVINE S , FINN C , DARRELL T , et al. End-to-end training of deep visuomotor policies[J]. Machine Learning Research, 2015, 17 (1): 1334- 1373.
4	ZHANG H D, LI D C, HE Y Q. Multi-robot cooperation strategy in game environment using deep reinforcement learning[C]//Proc.of the IEEE International Conference on Robotics and Biomime-tics (ROBIO), 2018: 886-891.
5	HISHMEH L, AWAD F. Deer in the headlights: short term planning via reinforcement learning algorithms for autonomous vehicles[C]//Proc.of the 11th International Conference on Information and Communication Systems, 2020: 255-260.
6	KONSTANTINOS M , MARIA K , IOANNIS N . Deep reinforcement-learning-based driving policy for autonomous road vehicles[J]. IET Intelligent Transport Systems, 2020, 14 (1): 13- 24. doi: 10.1049/iet-its.2019.0249
7	MAKANTASIS K , KONTORINAKI M , NIKOLOS I . Deep reinforcement-learning-based driving policy for autonomous road vehicles[J]. IET Intelligence Transport Systems, 2020, 14 (1): 13- 24. doi: 10.1049/iet-its.2019.0249
8	VINYALS O , BABUSCHKIN I , CZARNECKI W M , et al. Grandmaster level in starcraft Ⅱ using multi-agent reinforcement learning[J]. Nature, 2019, 575 (7782): 350- 369. doi: 10.1038/s41586-019-1724-z
9	LAMPLE G, CHAPLOT D S. Playing FPS games with deep reinforcement learning[C]//Proc.of the Workshops at the AAAI Conference on Artificial Intelligence, 2017: 2140-2146.
10	FAN J S, REN H, TIAN C X. An analysis of wargame rules simulation based on stochastic lanchester models[C]//Proc.of the 6th International Conference on Network, Communication and Computing, Association for Computing Machinery, 2017: 135-139.
11	MOY G, SHEKH S. The application of alphazero to wargaming[C]//Proc.of the 32nd Australasian Joint Conference on Artificial Intelligence, 2019.
12	SHANG T F, DONG H Y, HAN K, et al. Research on evaluation method of wargame strategy based on fuzzy Petri net[C]//Proc.of the 2nd International Conference on Information Systems and Computer Aided Education, 2019: 626-629.
13	石崇林.基于数据挖掘的兵棋推演数据分析方法研究[D].长沙: 国防科学技术大学, 2012.
	SHI C L. Research of wargaming data analysis methods based on data mining[D]. Changsha: National University of Defense Technology, 2012.
14	秦园丽, 张训立, 高桂清, 等. 基于兵棋推演系统的作战方案评估方法研究[J]. 兵器装备工程学报, 2019, 40 (6): 92- 95.
	QIN Y L , ZHANG X L , GAO G Q , et al. Research on evaluation method of operational plan based on tactical missile chess deduction system[J]. Journal of Ordnance Equipment Engineering, 2019, 40 (6): 92- 95.
15	SHALABH B , RICHARD S S , MOHAMMAD G , et al. Na-tural actor-critic algorithms[J]. Automatica, 2009, 45 (11): 2471- 2482. doi: 10.1016/j.automatica.2009.07.008
16	GRZES M. Reward shaping in episodic reinforcement leaning[C]//Proc.of the 16th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 2017: 565-573.
17	ZHONG S , TAN J , DONG H , et al. Modeling-learning-based actor-critic algorithm with gaussian process approximator[J]. Grid Computing, 2020, 18, 181- 195. doi: 10.1007/s10723-020-09512-4
18	WATKINS C , DAYAN P . Q-learning[J]. Machine Learning, 1992, 8 (3/4): 279- 292. doi: 10.1023/A:1022676722315
19	LITTMAN M L . Reinforcement learning improves behaviour from evaluative feedback[J]. Nature, 2015, 521 (7553): 445- 451. doi: 10.1038/nature14540
20	SUTTON R S , BARTO A G . Reinforcement learning: an introduction[M]. 2nd ed U.K.: MIT Press, 2018.
21	LITTMAN M L . Markov games as a framework for multi-agent reinforcement learning[M]. New Brunswick: Machine Learning Proceedings, 1994: 157- 163.
22	PAULO C H , MOU S S . Distributed multi-agent reinforcement learning by actor-critic method[J]. IFAC Papers On Line, 2019, 52 (20): 363- 368. doi: 10.1016/j.ifacol.2019.12.182
23	LIU Y X, TAN Y. Learning distributed coordinated policy in catching game with multi-agent reinforcement learning[C]//Proc.of the International Joint Conference on Neural Networks, 2019.
24	LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015, 521 (7533): 436- 444.
25	MNIH V , KAVUKCUOGLU K , SILVER D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
26	ETHAN D , GANGER M , HU W . Exploring deep reinforcement learning with multi-Q-learning[J]. Intelligent Control and Automation, 2016, 7 (4): 129- 144. doi: 10.4236/ica.2016.74012
27	SILVER D , HUANG A , MADDISON C , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489. doi: 10.1038/nature16961
28	周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 347- 350.
	ZHOU Z H . Machine learning[M]. Beijing: Tsinghua University Press, 2016: 347- 350.
29	闫科, 张永亮, 陶伟. 突破·铁甲指挥官[M]. 北京: 电子工业出版社, 2018.
	YAN K , ZHANG Y L , TAO W . Breakthrough commander of iron armor[M]. Beijing: Electronic Industry Press, 2018.
30	LI L , LI D , SONG T , et al. Actor-Critic learning control based on regularized temporal-difference prediction with gra-dient correction[J]. IEEE Trans.on Neural Networks and Learning Systems, 2018, 29 (12): 5899- 5909. doi: 10.1109/TNNLS.2018.2808203

算子类型	红方可选择动作类型	蓝方可选择动作类型
重型坦克	机动	机动
	行进间射击	行进间射击
	夺控	夺控
步战车	机动	机动
	直瞄射击	直瞄射击
	夺控	夺控

作战实体	阵营	武器类型	最大攻击等级	攻击范围(格)	单车数量	初始坐标
坦克	红	大号直瞄炮	10	15	3	1 812
步战车	红	重型机枪	8	13	3	1 911
坦克	蓝	大号直瞄炮	10	15	3	1 529
步战车	蓝	重型机枪	8	13	3	1 630

[1]	张寒, 黄炎焱, 耿泽, 张振. 营区突发事件应急保障方案仿真推演评估方法[J]. 系统工程与电子技术, 2022, 44(11): 3433-3442.
[2]	曾斌, 张鸿强, 李厚朴. 针对无人潜航器的反潜策略研究[J]. 系统工程与电子技术, 2022, 44(10): 3174-3181.
[3]	宋波, 叶伟, 孟祥辉. 基于多智能体强化学习的动态频谱分配方法综述[J]. 系统工程与电子技术, 2021, 43(11): 3338-3351.
[4]	张可, 郝文宁, 余晓晗, 靳大尉, 邵天浩. 基于遗传模糊系统的兵棋推演关键点推理方法[J]. 系统工程与电子技术, 2020, 42(10): 2303-2311.
[5]	石崇林, 张茂军, 吴琳, 唐宇波, 景民. 基于密度的计算机兵棋推演数据快速聚类算法[J]. Journal of Systems Engineering and Electronics, 2011, 33(11): 2428-2433.

Actor-Critic框架下的多智能体决策方法及其在兵棋上的应用

Multi-agent decision-making method based on Actor-Critic framework and its application in wargame

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 30

相关文章 5

编辑推荐

Metrics

本文评价