基于线性近似的即时差分学习

Temporal Difference Learning Based on Linear Approximation

摘要: 讨论基于线性近似的即时差分(TD(λ))学习和最小二乘即时差分(LSTD)学习算法以逼近一平均报酬准则的马氏决策过程的相对值函数,逼近是通过特征函数的线性组合而实现的,其权值的更新具有增量形式.

Abstract: The TD(λ) learning and least squares temporal difference (LSTD) learning algorithms that approximate the bias value function of an average reward Markov decision problem are proposed.Approximations are comprised of linear combinations of fixed feature functions whose weights are incrementally updated during a single endless process of the problem.