机器学习中加速强化学习的一种函数方法 |
| |
作者单位: | 1.云南交通职业技术学院,2.云南大学 |
| |
摘 要: | 机器学习中值函数需要反复更新直至其收敛是造成强化学习速度慢的根本原因.提出一种可实现批量更新值函数的学习方法,从加快值函数收敛的角度来加速强化学习.通过在训练情节中记录下从初始状态到达当前状态的状态转换序列,从中求出其它状态到达当前状态的最短状态路径,使当前状态更新的值函数可沿该最短状态路径逆序向前传播,从而实现值函数的批量更新.从在栅格环境中求最短路径的仿真试验结果看,该方法可显著提高值函数的更新频率,缩短学习时间.
|
关 键 词: | 强化学习 值函数 最短状态路径 加速学习 栅格环境 |
Machine learning accelerated in reinforcement learning a function method |
| |
Affiliation: | 1.Yunnan Jiao Tong Vocational and Technical College,2.Yunnan University |
| |
Abstract: | Value function need be refined repeatedly until it is convergent,which is the major reason to make reinforcement learning being slow.A learning algorithm which can update value function in batches is proposed to speed up learning by improving the refining frequency of value function.By discovering the shortest state trajectories form other states to the current state form the state-action transition sequence recorded in training episode,the refined value function of this current state can be propagated reversely along the shortest state trajectories,which makes a batch of value functions can be refined immediately.From the experiments to find the shortest path in the Grid-World,this approach can improve significantly the refining frequency of value function,and shorten learning time. |
| |
Keywords: | reinforcement learning value function the shortest state trajectory speeding up learning Grid-World |
|
| 点击此处可从《云南大学学报(自然科学版)》浏览原始摘要信息 |
|
点击此处可从《云南大学学报(自然科学版)》下载全文 |