• Overview of Chinese core journals
  • Chinese Science Citation Database(CSCD)
  • Chinese Scientific and Technological Paper and Citation Database (CSTPCD)
  • China National Knowledge Infrastructure(CNKI)
  • Chinese Science Abstracts Database(CSAD)
  • JST China
  • SCOPUS
XU Haixin, HU Rong, QIAN Bin. Simulated annealing based on deep reinforcement learning for solving the two-echelon vehicle routing problem[J]. Journal of Yunnan University: Natural Sciences Edition. DOI: 10.7540/j.ynu.20240263
Citation: XU Haixin, HU Rong, QIAN Bin. Simulated annealing based on deep reinforcement learning for solving the two-echelon vehicle routing problem[J]. Journal of Yunnan University: Natural Sciences Edition. DOI: 10.7540/j.ynu.20240263

Simulated annealing based on deep reinforcement learning for solving the two-echelon vehicle routing problem

  • A simulated annealing based on deep reinforcement learning (SADRL) is proposed to solve the widely prevalent two-echelon vehicle routing problem (2E-VRP) in practical logistics, with the optimization objective of minimizing the total route length. Since 2E-VRP consists of two coupled sub-stages, i.e., customer-satellite allocation stage and delivery route planning stage. Different customer-satellite allocation schemes will affect the optimization of subsequent delivery routes, so the solution space of 2E-VRP will be extensive and complex. According to this characteristic, in SADRL, firstly, a key-value encoding and decoding scheme is designed for the customer-satellite allocation problem, and the simulated annealing (SA) algorithm is used to solve the customer-satellite allocation problem. 2E-VRP can be decomposed into multiple VRP subproblems. Secondly, based on the decomposition scheme, the attention model-VRP (AM-VRP) trained by reinforcement learning is used to obtain the high-quality delivery route of VRP, which can quickly evaluate the quality of the decomposition scheme, reduce the complexity of the problem, and guide the algorithm to efficiently explore high-quality solution areas in the complex solution space. Finally, for the decomposed multiple VRP subproblems, a variable neighborhood descent with destruction/reconstruction operations (VND-DRO) algorithm was designed to further optimize their delivery routes, in order to achieve in-depth and detailed search of high-quality solution spaces and discover deep high-quality solutions in complex solution spaces. Experimental verification on datasets of different scales confirms the effectiveness of the proposed SADRL in solving 2E-VRP.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return