胡光华. 参数Markov决策过程的随机逼近算法[J]. 云南大学学报(自然科学版), 2003, 25(5): 377-380.
引用本文: 胡光华. 参数Markov决策过程的随机逼近算法[J]. 云南大学学报(自然科学版), 2003, 25(5): 377-380.
HU Guang-hua. A stochastic approximation for parameters Markov decision processes[J]. Journal of Yunnan University: Natural Sciences Edition, 2003, 25(5): 377-380.
Citation: HU Guang-hua. A stochastic approximation for parameters Markov decision processes[J]. Journal of Yunnan University: Natural Sciences Edition, 2003, 25(5): 377-380.

参数Markov决策过程的随机逼近算法

A stochastic approximation for parameters Markov decision processes

  • 摘要: 讨论平均报酬参数马氏决策过程的随机梯度算法,利用与折扣报酬的关系,给出了目标函数的梯度的一个新的表达式.同时得到了基于单一样本路径的随机逼近算法,最后证明了算法以概率1收敛到其梯度.

     

    Abstract: A stochastic gradient algorithm for average reward Markov decision processes (MDP) that depends on a parameter vector is proposed.A new gradient of the object function is given and a stochastic approximation algorithm that bases on a single sample path is presented.Finally,a convergence of the gradient (with probability 1) is provided.

     

/

返回文章
返回