首页 | 本学科首页   官方微博 | 高级检索  
     

基于动态延迟策略更新的TD3 算法
引用本文:康朝海,孙超,荣垂霆,刘鹏云. 基于动态延迟策略更新的TD3 算法[J]. 吉林大学学报(信息科学版), 2008, 38(4): 474-481
作者姓名:康朝海  孙超  荣垂霆  刘鹏云
作者单位:东北石油大学电气信息工程学院,黑龙江大庆163318
基金项目:黑龙江省自然科学基金资助项目( E2018004)
摘    要:在深度强化学习领域中,为进一步减少双延迟深度确定性策略梯度TD3( Twin Delayed Deep DeterministicPolicy Gradients) 中价值过估计对策略估计的影响,加快模型学习的效率,提出一种基于动态延迟策略更新的双延迟深度确定性策略梯度( DD-TD3: Twin Delayed Deep Deterministic Policy Gradients with Dynamic Delayed PolicyUpdate) 。在DD-TD3 方法中,通过Critic 网络的最新Loss 值与其指数加权移动平均值的动态差异指导Actor 网络的延迟更新步长。实验结果表明,与原始TD3 算法在2 000 步获得较高的奖励值相比,DD-TD3 方法可在约1 000步内学习到最优控制策略,并且获得更高的奖励值,从而提高寻找最优策略的效率。

关 键 词:深度强化学习  TD3 算法  动态延迟策略更新  
收稿时间:2020-01-17

TD3 Algorithm with Dynamic Delayed Policy Update
KANG Chaohai,SUN Chao,RONG Chuiting,LIU Pengyun. TD3 Algorithm with Dynamic Delayed Policy Update[J]. Journal of Jilin University:Information Sci Ed, 2008, 38(4): 474-481
Authors:KANG Chaohai  SUN Chao  RONG Chuiting  LIU Pengyun
Affiliation:School of Electrical Engineering and Information,Northeast Petroleum University,Daqing 163318,China
Abstract:In the field of deep reinforcement learning, in order to further reduce the impact of valueoverestimation on policy estimation in TD3 ( Twin Delayed Deep Deterministic Policy Gradients) and acceleratethe efficiency of model learning,a DD-TD3 ( Twin Delayed Deep Deterministic Policy Gradients with DynamicDelayed Policy Update) is proposed. The delay update step size of the actor network is guided by the dynamicdifference between the latest loss of the critic network and its exponential weighted moving average. Experimentalresults show that compared to the original TD3 algorithm that obtain high reward value in the 2 000 steps,theDD-TD3 method can learn the optimal control strategy in about 1 000 steps and obtain a higher reward value,thereby the efficiency of finding the optimal strategy is improved.
Keywords:deep reinforcement learning  twin delayed deep deterministic policy gradients ( TD3)   dynamic delayed policy update
  
点击此处可从《吉林大学学报(信息科学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(信息科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号