基于深度强化学习的模拟退火算法求解两级车辆路径问题

许海鑫; 胡蓉; 钱斌

doi:10.7540/j.ynu.20240263

基于深度强化学习的模拟退火算法求解两级车辆路径问题

Simulated annealing based on deep reinforcement learning for solving the two-echelon vehicle routing problem

摘要

摘要: 针对实际物流中广泛存在的两级车辆路径问题（two-echelon vehicle routing problem，2E-VRP），以最小化总路径长度为优化目标，提出一种基于深度强化学习的模拟退火算法（simulated annealing based on deep reinforcement learning，SADRL）进行求解. 由于2E-VRP包含两个相互耦合的子阶段，即客户—中转站分配阶段和配送路径规划阶段. 不同的客户–中转站分配方案将会影响后续的配送路径优化，故2E-VRP的解空间将十分庞大且复杂. 根据以上特点，在SADRL中，首先针对客户—中转站分配问题，设计键–值对的编解码方案，并采用模拟退火算法（simulated annealing，SA）求解客户—中转站分配问题，可将2E-VRP分解为多个VRP子问题. 然后，基于分解方案，采用强化学习训练好的注意力模型（attention model-VRP，AM-VRP）来获取VRP的优质配送路径，进而可实现对分解方案优劣的快速评价，在降低问题复杂度的同时引导算法高效的探索复杂解空间的优质解区域. 最后，针对分解后的多个VRP子问题，设计了结合破坏/重构操作的变邻域下降（variable neighborhood descent with destruction/reconstruction operations，VND_DRO）算法对其配送路径进一步优化，以实现对优质解空间的深入且细致搜索，进而发现处于复杂解空间的深层优质解. 通过在不同规模测试集上的实验验证，证明了本文所提出SADRL求解2E-VRP的有效性.

Abstract: A simulated annealing based on deep reinforcement learning (SADRL) is proposed to solve the widely prevalent two-echelon vehicle routing problem (2E-VRP) in practical logistics, with the optimization objective of minimizing the total route length. Since 2E-VRP consists of two coupled sub-stages, i.e., customer-satellite allocation stage and delivery route planning stage. Different customer-satellite allocation schemes will affect the optimization of subsequent delivery routes, so the solution space of 2E-VRP will be extensive and complex. According to this characteristic, in SADRL, firstly, a key-value encoding and decoding scheme is designed for the customer-satellite allocation problem, and the simulated annealing (SA) algorithm is used to solve the customer-satellite allocation problem. 2E-VRP can be decomposed into multiple VRP subproblems. Secondly, based on the decomposition scheme, the attention model-VRP (AM-VRP) trained by reinforcement learning is used to obtain the high-quality delivery route of VRP, which can quickly evaluate the quality of the decomposition scheme, reduce the complexity of the problem, and guide the algorithm to efficiently explore high-quality solution areas in the complex solution space. Finally, for the decomposed multiple VRP subproblems, a variable neighborhood descent with destruction/reconstruction operations (VND-DRO) algorithm was designed to further optimize their delivery routes, in order to achieve in-depth and detailed search of high-quality solution spaces and discover deep high-quality solutions in complex solution spaces. Experimental verification on datasets of different scales confirms the effectiveness of the proposed SADRL in solving 2E-VRP.

HTML全文

参考文献(32)

施引文献

资源附件(0)