MADDPG算法并行优先经验回放机制 Parallel priority experience replay mechanism of MADDPG algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

MADDPG算法并行优先经验回放机制

引用本文：	高昂,董志明,李亮,宋敬华,段莉.MADDPG算法并行优先经验回放机制[J].系统工程与电子技术,2021,43(2):420-433.

作者姓名：	高昂董志明李亮宋敬华段莉

作者单位：	1. 陆军装甲兵学院演训中心, 北京 1000722. 中国人民解放军61516部队, 北京 100076

基金项目：	军队科研计划项目(41405030302,41401020301)资助课题。

摘要：	多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient, MADDPG)算法是深度强化学习方法在多智能体系统(multi-agent system, MAS)领域的重要运用,为提升算法性能,提出基于并行优先经验回放机制的MADDPG算法。分析算法框架及训练方法,针对算法集中式训练、分布式执行的特点,采用并行方法完成经验回放池数据采样,并在采样过程中引入优先回放机制,实现经验数据并行流动、数据处理模型并行工作、经验数据优先回放。分别在OpenAI多智能体对抗、合作两类典型环境中,从训练轮数、训练时间两个维度对改进算法进行了对比验证,结果表明,并行优先经验回放机制的引入使得算法性能提升明显。
关键词：	多智能体系统深度强化学习并行方法优先经验回放深度确定性策略梯度
收稿时间：	2020-03-06
Parallel priority experience replay mechanism of MADDPG algorithm

GAO Ang,DONG Zhiming,LI Liang,SONG Jinghua,DUAN Li.Parallel priority experience replay mechanism of MADDPG algorithm[J].System Engineering and Electronics,2021,43(2):420-433.

Authors:	GAO Ang DONG Zhiming LI Liang SONG Jinghua DUAN Li

Institution:	1. Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China2. Unit 61516 of the PLA, Beijing 100076, China

Abstract:	The multi-agent deep deterministic policy gradient(MADDPG)algorithm is an important algorithm for deep reinforcement learning in the field of the multi-agent system(MAS).To improve the performance of the algorithm,the parallel priority experience replay mechanism of the algorithm is proposed.The algorithm framework and training method are analyzed.Aiming at the characteristics of centralized training and distributed execution of the algorithm,the multi-agent experience replay pool data sampling is completed by using the parallel method,and the priority experience replay mechanism is introduced in the sampling process.Thus,the parallel flow of empirical data is realized,the data processing model works in parallel,and the empirical data is prior replayed.Finally,the improved algorithm is compared and verified from the two dimensions of the training episode and the training time respectively in the typical environment of OpenAI multi-agent confrontation and cooperation.The results show that the introduction of the parallel prior experience replay mechanism makes the efficiency of the algorithm being improved obviously.

Keywords:	multi-agent system(MAS) deep reinforcement learning parallel method priority experience replay deep deterministic policy gradient
本文献已被维普等数据库收录！
	点击此处可从《系统工程与电子技术》浏览原始摘要信息
	点击此处可从《系统工程与电子技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏