首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Metropolis准则的多步Q学习算法与性能仿真
引用本文:陈圣磊,吴慧中,肖亮,朱耀琴.基于Metropolis准则的多步Q学习算法与性能仿真[J].系统仿真学报,2007,19(6):1284-1287.
作者姓名:陈圣磊  吴慧中  肖亮  朱耀琴
作者单位:南京理工大学计算机科学与技术学院,南京,210094
基金项目:武器装备预研基金;南京理工大学校科研和教改项目
摘    要:强化学习是目前智能体和机器学习研究的热点。针对强化学习中标准Q学习算法更新速度慢的缺点,通过引入多步信息更新策略和模拟退火中的Metropolis准则,提出了一种新颖的多步Q学习算法,称为SAMQ算法。仿真实验表明,与现有的算法相比,该算法能够有效提高收敛速度,较好地解决智能体选择动作时面临的新知识探索还是当前策略遵循的关键问题。

关 键 词:强化学习  Q学习  模拟退火  多步Q学习  Metropolis准则
文章编号:1004-731X(2007)06-1284-04
收稿时间:2006-01-17
修稿时间:2006-12-25

Metropolis Policy-based Multi-step Q Learning Algorithm and Performance Simulation
CHEN Sheng-lei,WU Hui-zhong,XIAO Liang,ZHU Yao-qing.Metropolis Policy-based Multi-step Q Learning Algorithm and Performance Simulation[J].Journal of System Simulation,2007,19(6):1284-1287.
Authors:CHEN Sheng-lei  WU Hui-zhong  XIAO Liang  ZHU Yao-qing
Abstract:Increasing attention has been paid to reinforcement learning in Agent and machine learning community at present. By using multi-step information updating strategy and Metropolis criterion in simulated annealing,a new multi-step Q learning method, which is called simulated annealing-based multi-step Q learning(SAMQ), was proposed to compensate for the drawbacks of slow update speed in standard Q learning. Simulated experiments demonstrate that convergence speed can be effectively improved in this algorithm compared with other existing ones and the problem of new knowledge exploration or current policy exploitation that the Agent is confronted with when selecting actions can be resolved properly.
Keywords:reinforcement learning  Q learning  simulated annealing  multi-step Q learning  Metropolis criterion
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号