Systems Engineering and Electronics ›› 2021, Vol. 43 ›› Issue (2): 420-433.doi: 10.12305/j.issn.1001-506X.2021.02.17

• Systems Engineering • Previous Articles     Next Articles

Parallel priority experience replay mechanism of MADDPG algorithm

Ang GAO1(), Zhiming DONG1,*(), Liang LI1(), Jinghua SONG1(), Li DUAN2()   

  1. 1. Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China
    2. Unit 61516 of the PLA, Beijing 100076, China
  • Received:2020-03-06 Online:2021-02-01 Published:2021-03-16
  • Contact: Zhiming DONG E-mail:15689783388@163.com;236211588@qq.com;liliang_zgy@163.com;jhsong@sina.com;E-mail:236211566@qq.com

Abstract:

The multi-agent deep deterministic policy gradient (MADDPG) algorithm is an important algorithm for deep reinforcement learning in the field of the multi-agent system (MAS). To improve the performance of the algorithm, the parallel priority experience replay mechanism of the algorithm is proposed. The algorithm framework and training method are analyzed. Aiming at the characteristics of centralized training and distributed execution of the algorithm, the multi-agent experience replay pool data sampling is completed by using the parallel method, and the priority experience replay mechanism is introduced in the sampling process. Thus, the parallel flow of empirical data is realized, the data processing model works in parallel, and the empirical data is prior replayed. Finally, the improved algorithm is compared and verified from the two dimensions of the training episode and the training time respectively in the typical environment of OpenAI multi-agent confrontation and cooperation. The results show that the introduction of the parallel prior experience replay mechanism makes the efficiency of the algorithm being improved obviously.

Key words: multi-agent system (MAS), deep reinforcement learning, parallel method, priority experience replay, deep deterministic policy gradient

CLC Number: 

[an error occurred while processing this directive]