基于Metropolis准则的多步Q学习算法与性能仿真 Metropolis Policy-based Multi-step Q Learning Algorithm and Performance Simulation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于Metropolis准则的多步Q学习算法与性能仿真

引用本文：	陈圣磊,吴慧中,肖亮,朱耀琴.基于Metropolis准则的多步Q学习算法与性能仿真[J].系统仿真学报,2007,19(6):1284-1287.

作者姓名：	陈圣磊吴慧中肖亮朱耀琴

作者单位：	南京理工大学计算机科学与技术学院,南京,210094

基金项目：	武器装备预研基金;南京理工大学校科研和教改项目

摘要：	强化学习是目前智能体和机器学习研究的热点。针对强化学习中标准Q学习算法更新速度慢的缺点,通过引入多步信息更新策略和模拟退火中的Metropolis准则,提出了一种新颖的多步Q学习算法,称为SAMQ算法。仿真实验表明,与现有的算法相比,该算法能够有效提高收敛速度,较好地解决智能体选择动作时面临的新知识探索还是当前策略遵循的关键问题。
关键词：	强化学习 Q学习模拟退火多步Q学习 Metropolis准则
文章编号：	1004-731X（2007）06-1284-04
收稿时间：	2006-01-17
修稿时间：	2006-12-25
Metropolis Policy-based Multi-step Q Learning Algorithm and Performance Simulation

CHEN Sheng-lei,WU Hui-zhong,XIAO Liang,ZHU Yao-qing.Metropolis Policy-based Multi-step Q Learning Algorithm and Performance Simulation[J].Journal of System Simulation,2007,19(6):1284-1287.

Authors:	CHEN Sheng-lei WU Hui-zhong XIAO Liang ZHU Yao-qing

Abstract:	Increasing attention has been paid to reinforcement learning in Agent and machine learning community at present. By using multi-step information updating strategy and Metropolis criterion in simulated annealing,a new multi-step Q learning method, which is called simulated annealing-based multi-step Q learning(SAMQ), was proposed to compensate for the drawbacks of slow update speed in standard Q learning. Simulated experiments demonstrate that convergence speed can be effectively improved in this algorithm compared with other existing ones and the problem of new knowledge exploration or current policy exploitation that the Agent is confronted with when selecting actions can be resolved properly.

Keywords:	reinforcement learning Q learning simulated annealing multi-step Q learning Metropolis criterion
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏