Simulated annealing based on deep reinforcement learning for solving the two-echelon vehicle routing problem
-
Graphical Abstract
-
Abstract
A simulated annealing based on deep reinforcement learning (SADRL) is proposed to solve the widely prevalent two-echelon vehicle routing problem (2E-VRP) in practical logistics, with the optimization objective of minimizing the total route length. Since 2E-VRP consists of two coupled sub-stages, i.e., customer-satellite allocation stage and delivery route planning stage. Different customer-satellite allocation schemes will affect the optimization of subsequent delivery routes, so the solution space of 2E-VRP will be extensive and complex. According to this characteristic, in SADRL, firstly, a key-value encoding and decoding scheme is designed for the customer-satellite allocation problem, and the simulated annealing (SA) algorithm is used to solve the customer-satellite allocation problem. 2E-VRP can be decomposed into multiple VRP subproblems. Secondly, based on the decomposition scheme, the attention model-VRP (AM-VRP) trained by reinforcement learning is used to obtain the high-quality delivery route of VRP, which can quickly evaluate the quality of the decomposition scheme, reduce the complexity of the problem, and guide the algorithm to efficiently explore high-quality solution areas in the complex solution space. Finally, for the decomposed multiple VRP subproblems, a variable neighborhood descent with destruction/reconstruction operations (VND-DRO) algorithm was designed to further optimize their delivery routes, in order to achieve in-depth and detailed search of high-quality solution spaces and discover deep high-quality solutions in complex solution spaces. Experimental verification on datasets of different scales confirms the effectiveness of the proposed SADRL in solving 2E-VRP.
-
-