胡光华, 胡光涛. 基于线性近似的即时差分学习[J]. 云南大学学报(自然科学版), 2002, 24(1): 9-13.
引用本文: 胡光华, 胡光涛. 基于线性近似的即时差分学习[J]. 云南大学学报(自然科学版), 2002, 24(1): 9-13.
HU Guang-hua, HU Guang-tao. Temporal Difference Learning Based on Linear Approximation[J]. Journal of Yunnan University: Natural Sciences Edition, 2002, 24(1): 9-13.
Citation: HU Guang-hua, HU Guang-tao. Temporal Difference Learning Based on Linear Approximation[J]. Journal of Yunnan University: Natural Sciences Edition, 2002, 24(1): 9-13.

基于线性近似的即时差分学习

Temporal Difference Learning Based on Linear Approximation

  • 摘要: 讨论基于线性近似的即时差分(TD(λ))学习和最小二乘即时差分(LSTD)学习算法以逼近一平均报酬准则的马氏决策过程的相对值函数,逼近是通过特征函数的线性组合而实现的,其权值的更新具有增量形式.

     

    Abstract: The TD(λ) learning and least squares temporal difference (LSTD) learning algorithms that approximate the bias value function of an average reward Markov decision problem are proposed.Approximations are comprised of linear combinations of fixed feature functions whose weights are incrementally updated during a single endless process of the problem.

     

/

返回文章
返回