参数Markov决策过程的随机逼近算法

A stochastic approximation for parameters Markov decision processes

  • 摘要: 讨论平均报酬参数马氏决策过程的随机梯度算法,利用与折扣报酬的关系,给出了目标函数的梯度的一个新的表达式.同时得到了基于单一样本路径的随机逼近算法,最后证明了算法以概率1收敛到其梯度.

     

    Abstract: A stochastic gradient algorithm for average reward Markov decision processes (MDP) that depends on a parameter vector is proposed.A new gradient of the object function is given and a stochastic approximation algorithm that bases on a single sample path is presented.Finally,a convergence of the gradient (with probability 1) is provided.

     

/

返回文章
返回