机器学习中加速强化学习的一种函数方法

Machine learning accelerated in reinforcement learning a function method

摘要: 机器学习中值函数需要反复更新直至其收敛是造成强化学习速度慢的根本原因.提出一种可实现批量更新值函数的学习方法，从加快值函数收敛的角度来加速强化学习.通过在训练情节中记录下从初始状态到达当前状态的状态转换序列，从中求出其它状态到达当前状态的最短状态路径，使当前状态更新的值函数可沿该最短状态路径逆序向前传播，从而实现值函数的批量更新.从在栅格环境中求最短路径的仿真试验结果看，该方法可显著提高值函数的更新频率，缩短学习时间.

Abstract: Value function need be refined repeatedly until it is convergent,which is the major reason to make reinforcement learning being slow.A learning algorithm which can update value function in batches is proposed to speed up learning by improving the refining frequency of value function.By discovering the shortest state trajectories form other states to the current state form the state-action transition sequence recorded in training episode,the refined value function of this current state can be propagated reversely along the shortest state trajectories,which makes a batch of value functions can be refined immediately.From the experiments to find the shortest path in the Grid-World,this approach can improve significantly the refining frequency of value function,and shorten learning time.