首页 | 本学科首页   官方微博 | 高级检索  
     检索      

最小二乘支持向量机在强化学习系统中的应用
引用本文:WANG Xue-song,田西兰,CHENG Yu-hu,马小平.最小二乘支持向量机在强化学习系统中的应用[J].系统仿真学报,2008,20(14).
作者姓名:WANG Xue-song  田西兰  CHENG Yu-hu  马小平
作者单位:中国矿业大学信息与电气工程学院,江苏,徐州,221008
基金项目:教育部高等学校博士学科点专项科研基金,中国博士后科学基金,江苏省博士后科学基金,中国矿业大学校科研和教改项目,江苏省教育厅青蓝工程项目 
摘    要:将连续状态空间下的Q学习构建为最小二乘支持向量机的回归估计问题,利用最小二乘支持向量机良好的泛化以及非线性逼近性能实现由系统状态-动作对到Q值函数的映射。为了保证计算速度以及适应Q学习系统在线学习的需要,最小二乘支持向量机的训练样本是窗式移动的,即在Q学习系统学习的同时获取样本数据并进行最小二乘支持向量机的训练。小车爬山控制问题的仿真结果表明该方法学习效率高,能够有效解决强化学习系统连续状态空间的泛化问题。

关 键 词:最小二乘支持向量机  强化学习  Q学习  泛化

Application of Least Squares Support Vector Machine to Reinforcement Learning System
WANG Xue-song,TIAN Xi-lan,CHENG Yu-hu,MA Xiao-ping.Application of Least Squares Support Vector Machine to Reinforcement Learning System[J].Journal of System Simulation,2008,20(14).
Authors:WANG Xue-song  TIAN Xi-lan  CHENG Yu-hu  MA Xiao-ping
Abstract:Q learning system for continuous state space was constructed as a regression problem for a least squares support machine (LS-SVM). This LS-SVM was used to realize a mapping from a state-action pair to a Q value function by taking full advantage of perfect generalization and nonlinear approximation properties of LS-SVM. In order to improve computation speed and satisfy the requirements of on-line Q learning system, training samples of the LS-SVM were introduced in the manner of a time-window, i.e., the training samples are obtained to train the LS-SVM at the same time as the Q learning system progresses. Simulation results of a mountain car control show that the proposed Q learning method is highly efficient and can effectively solve the generalization problem of continuous state space.
Keywords:least squares support vector machine  reinforcement learning  Q learning  generalization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号