首页 | 本学科首页   官方微博 | 高级检索  
     

基于状态集结的值函数逼近
引用本文:胡光华 刘英敏. 基于状态集结的值函数逼近[J]. 北京理工大学学报, 2000, 20(3): 304-308
作者姓名:胡光华 刘英敏
作者单位:北京理工大学自动控制系!北京100081
摘    要:用更为紧凑的方法表示和存贮值函数,以求解大规模平均模型Markov决策规划(MDP)问题。通过状态集结相对值迭代算法逼近值函数,用Span半范数和压缩映原理分析算法的收敛性。给出了状态集结后的Bellman最优方程。在Span压缩条件下了该算法的收敛性,同时还给出了其误差估计。

关 键 词:动态规划 状态集结 随机控制 值函数逼近

Value Function Approximation with State Aggregation
HU Guang hu, LIU Ying min, WU Cang pu. Value Function Approximation with State Aggregation[J]. Journal of Beijing Institute of Technology(Natural Science Edition), 2000, 20(3): 304-308
Authors:HU Guang hu   LIU Ying min   WU Cang pu
Abstract:To represent and store cost to go functions with more compact representations than lookup tables in scaling up average reward Markov decision processes, the state aggregation with relative value iteration algorithm was used to approximate the value function, the Span semi norm and the contraction mapping law were used to analyse the convergence of the algorithm. The Bellman equation for the state aggregation model was given. The convergence result was proved and an error bound for the proposed algorithm was presented under the condition of contraction with Span semi norm.
Keywords:dynamic programming  Markov decision processes  compact repre sentation  state aggregation  average reward
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号