首页 | 本学科首页   官方微博 | 高级检索  
     检索      

MADDPG算法并行优先经验回放机制
引用本文:高昂,董志明,李亮,宋敬华,段莉.MADDPG算法并行优先经验回放机制[J].系统工程与电子技术,2021,43(2):420-433.
作者姓名:高昂  董志明  李亮  宋敬华  段莉
作者单位:1. 陆军装甲兵学院演训中心, 北京 1000722. 中国人民解放军61516部队, 北京 100076
基金项目:军队科研计划项目(41405030302,41401020301)资助课题。
摘    要:多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient, MADDPG)算法是深度强化学习方法在多智能体系统(multi-agent system, MAS)领域的重要运用,为提升算法性能,提出基于并行优先经验回放机制的MADDPG算法。分析算法框架及训练方法,针对算法集中式训练、分布式执行的特点,采用并行方法完成经验回放池数据采样,并在采样过程中引入优先回放机制,实现经验数据并行流动、数据处理模型并行工作、经验数据优先回放。分别在OpenAI多智能体对抗、合作两类典型环境中,从训练轮数、训练时间两个维度对改进算法进行了对比验证,结果表明,并行优先经验回放机制的引入使得算法性能提升明显。

关 键 词:多智能体系统  深度强化学习  并行方法  优先经验回放  深度确定性策略梯度  
收稿时间:2020-03-06

Parallel priority experience replay mechanism of MADDPG algorithm
GAO Ang,DONG Zhiming,LI Liang,SONG Jinghua,DUAN Li.Parallel priority experience replay mechanism of MADDPG algorithm[J].System Engineering and Electronics,2021,43(2):420-433.
Authors:GAO Ang  DONG Zhiming  LI Liang  SONG Jinghua  DUAN Li
Institution:1. Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China2. Unit 61516 of the PLA, Beijing 100076, China
Abstract:The multi-agent deep deterministic policy gradient(MADDPG)algorithm is an important algorithm for deep reinforcement learning in the field of the multi-agent system(MAS).To improve the performance of the algorithm,the parallel priority experience replay mechanism of the algorithm is proposed.The algorithm framework and training method are analyzed.Aiming at the characteristics of centralized training and distributed execution of the algorithm,the multi-agent experience replay pool data sampling is completed by using the parallel method,and the priority experience replay mechanism is introduced in the sampling process.Thus,the parallel flow of empirical data is realized,the data processing model works in parallel,and the empirical data is prior replayed.Finally,the improved algorithm is compared and verified from the two dimensions of the training episode and the training time respectively in the typical environment of OpenAI multi-agent confrontation and cooperation.The results show that the introduction of the parallel prior experience replay mechanism makes the efficiency of the algorithm being improved obviously.
Keywords:multi-agent system(MAS)  deep reinforcement learning  parallel method  priority experience replay  deep deterministic policy gradient
本文献已被 维普 等数据库收录!
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号