基于行动分值的强化学习与奖赏优化 Action Values Based Reinforcement Learning and Optimized Reward Functions期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于行动分值的强化学习与奖赏优化

引用本文：	陈启军,肖云伟.基于行动分值的强化学习与奖赏优化[J].同济大学学报(自然科学版),2007,35(4):531-536.

作者姓名：	陈启军肖云伟

作者单位：	同济大学控制科学与工程系,上海,200092

基金项目：	国家自然科学基金;教育部跨世纪优秀人才培养计划;上海市"曙光计划"

摘要：	针对强化学习算法收敛速度慢、奖赏函数的设计需要改进的问题，提出一种新的强化学习算法．新算法使用行动分值作为智能行为者选择动作的依据．行动分值比传统的状态值具有更高的灵活性，因此更容易针对行动分值设计更加优化的奖赏函数，提高学习的性能．以行动分值为基础，使用了指数函数和对数函数，动态确定奖赏值与折扣系数，加快行为者选择最优动作．从走迷宫的计算机仿真程序可以看出，新算法显著减少了行为者在收敛前尝试中执行的动作次数，提高了收敛速度．
关键词：	强化学习行动分值 Q算法奖赏函数
文章编号：	0253-374X（2007）04-531-06
修稿时间：	2005-09-09
Action Values Based Reinforcement Learning and Optimized Reward Functions

CHEN Qijun,XIAO Yunwei.Action Values Based Reinforcement Learning and Optimized Reward Functions[J].Journal of Tongji University(Natural Science),2007,35(4):531-536.

Authors:	CHEN Qijun XIAO Yunwei

Institution:	Department of Control Science and Engineering, Tongji University, Shanghai 200092, China

Abstract:	A new reinforcement learning algorithm with "action values" as a basis for an agent to choose actions is put forward to improve the design of reward signals .For action values are more flexible than traditional state values,it is easier to design more optimized reward functions and improve learning performance.Based on action values,an exponential function and a logarithmic function are used to compute action rewards and discount rate dynamically,which accelerates agents to choose optimized actions.It shows that through the computer simulation of a maze problem the new algorithm reduces action times before convergence and the convergence speed is thus enhanced.

Keywords:	reinforcement learning action values Q algorithm reward functions
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏