首页 | 本学科首页   官方微博 | 高级检索  
     

Markov控制过程基于神经元动态规划的优化算法
引用本文:唐昊,奚宏生,殷保群. Markov控制过程基于神经元动态规划的优化算法[J]. 中国科学技术大学学报, 2001, 31(5): 549-557
作者姓名:唐昊  奚宏生  殷保群
作者单位:中国科学技术大学自动化系,
基金项目:国家自然科学基金(69974037)和国家高性能计算基金(00208)资助项目
摘    要:论文在Markov性能势理论基础上,研究了Markov控制过程在神经元网络等逼近结构表示的随机平稳策略作用下的仿真优化算法,分析了它们在一个无限长的样本轨道上概率1的收敛性,并给出了一个三-状态受控Markov过程的数值实例。

关 键 词:Markov性能势理论 Markov控制过程 随机平稳策略 样本轨道 神经元动态规划 随机决策问题
文章编号:0253-2778(2001)05-0549-09

Optimization Algorithms for Markov Control Processes Using Neuro-dynamic Programming
TANG Hao,XI Hong sheng,YIN Bao qun. Optimization Algorithms for Markov Control Processes Using Neuro-dynamic Programming[J]. Journal of University of Science and Technology of China, 2001, 31(5): 549-557
Authors:TANG Hao  XI Hong sheng  YIN Bao qun
Abstract:Motivated by the needs of on line optimization of real word engineering systems, single sample path based optimization algorithms were studied for Markov control processes controlled by randomized stationary policies. The concept of Markov performance potential is introduced, and the policies can be represented by some approximate architectures such as neural networks. Unlike traditional computation based approaches, the policy parameters can be iterated and an optimal (or suboptimal) randomized stationary policy can be found according to a sample path obtained by observing the operation of a real system.This optimization method is a form of neuro dynamic programming methodology. The algorithms provided here have good adaptability as they can be used in different real systems, with a suitable choice of the parameters in the algorithms. Finally, the convergence of the algorithms with probability one on an infinite sample path is considered, and a numerical example for a three state controlled Markov chain is provided.
Keywords:Markov performance potentials  Markov control processes  randomized stationary policies  sample path
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号