基于线性近似的即时差分学习 Temporal Difference Learning Based on Linear Approximation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于线性近似的即时差分学习

引用本文：	胡光华,胡光涛.基于线性近似的即时差分学习[J].云南大学学报(自然科学版),2002,24(1):9-13.

作者姓名：	胡光华胡光涛

作者单位：	1. 云南大学数学系,云南昆明,650091 2. 云南大学统计系,云南昆明,650091

基金项目：	云南省自然科学基金资助项目 ( 2 0 0 0A0 0 0 1-1M )

摘要：	讨论基于线性近似的即时差分(TD(λ))学习和最小二乘即时差分(LSTD)学习算法以逼近一平均报酬准则的马氏决策过程的相对值函数,逼近是通过特征函数的线性组合而实现的,其权值的更新具有增量形式.
关键词：	即时差分学习线性近似马氏决策过程最小二乘算法平均报酬准则
文章编号：	0258-7971(2002)01-0009-05
Temporal Difference Learning Based on Linear Approximation

HU Guang-hua ,HU Guang-tao.Temporal Difference Learning Based on Linear Approximation[J].Journal of Yunnan University(Natural Sciences),2002,24(1):9-13.

Authors:	HU Guang-hua HU Guang-tao

Institution:	HU Guang-hua 1,HU Guang-tao 2

Abstract:	The TD(λ) learning and least squares temporal difference (LSTD) learning algorithms that approximate the bias value function of an average reward Markov decision problem are proposed.Approximations are comprised of linear combinations of fixed feature functions whose weights are incrementally updated during a single endless process of the problem.

Keywords:	temporal difference learning linear approximation markov decision processes least-squares algorithm average reward criteria
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《云南大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《云南大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏