首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Online support vector regression for reinforcement learning
作者姓名:于振华  Cai  Yuanli
作者单位:School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, P.R.China
基金项目:国家高技术研究发展计划(863计划)
摘    要:The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online sup-port vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of sup-port vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable con- vergence speed and can solve continuous problems that are infeasible using lookup table.

关 键 词:在线支持矢量  学习加强  通信  连续位点

Online support vector regression for reinforcement learning
Yu Zhenhua,Cai Yuanli.Online support vector regression for reinforcement learning[J].High Technology Letters,2007,13(2):173-176.
Authors:Yu Zhenhua  Cai Yuanli
Abstract:The goal in reinforcement learning is to learn the value of state-action pair in order to maximize the total reward. For continuous states and actions in the real world, the representation of value functions is critical. Furthermore, the samples in value functions are sequentially obtained. Therefore, an online support vector regression (OSVR) is set up, which is a function approximator to estimate value functions in reinforcement learning. OSVR updates the regression function by analyzing the possible variation of support vector sets after new samples are inserted to the training set. To evaluate the OSVR learning ability, it is applied to the mountain-car task. The simulation results indicate that the OSVR has a preferable convergence speed and can solve continuous problems that are infeasible using lookup table.
Keywords:reinforcement learning  function approximation  support vector regression  online learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号