Abstract:
Value function need be refined repeatedly until it is convergent,which is the major reason to make reinforcement learning being slow.A learning algorithm which can update value function in batches is proposed to speed up learning by improving the refining frequency of value function.By discovering the shortest state trajectories form other states to the current state form the state-action transition sequence recorded in training episode,the refined value function of this current state can be propagated reversely along the shortest state trajectories,which makes a batch of value functions can be refined immediately.From the experiments to find the shortest path in the Grid-World,this approach can improve significantly the refining frequency of value function,and shorten learning time.