一种在线自适应控制马氏链的强化学习算法 An on-line Adaptive Control Markov Chains by Using Reinforcement Learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种在线自适应控制马氏链的强化学习算法

引用本文：	胡光华,胡光涛.一种在线自适应控制马氏链的强化学习算法[J].云南大学学报(自然科学版),2000,22(1):9-12.

作者姓名：	胡光华胡光涛

作者单位：	1. 云南大学,数学系,云南,昆明,650091 2. 云南大学,统计系,云南,昆明,650091

摘要：	讨论平均准则控制马氏链的强化学习算法。目的是寻找使得长期每阶段期望平均报酬最大的最优控制策略，由于事先未知状态转移矩阵及报酬向量，故必需使用自适应控制方法，通过引入称之为行动器和评判器的神经网络构造，使得学习单元在不断学习中，最终能发现最优策略。行动器的参数在学习中不断被修正，每一时刻的参数的值均对应着一个随机控制策略。评判器用来估计这些参数以找出最优控制策略。
关键词：	强化学习自适应评判马氏链控制问题
An on-line Adaptive Control Markov Chains by Using Reinforcement Learning

HU Guang-hua,HU Guang-tao.An on-line Adaptive Control Markov Chains by Using Reinforcement Learning[J].Journal of Yunnan University(Natural Sciences),2000,22(1):9-12.

Authors:	HU Guang-hua HU Guang-tao

Abstract:	An average reward reinforcement learning algorithm for control Markov chains is presented.The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.The transition matrices and payoff structures are not known a priori;so adaptive control methods are necessary.A neural networks structure,called actor and critic,is provided for the agent.The parameters of the actor,which determine a stochastic control strategy,are updated at each time step using a simple learning scheme.The adaptive critic is used to estimate these parameters for finding the optimal policy.

Keywords:	reinforcement learning Markov decision processes average reward adaptive critic R learning
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏