重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (12): 260-266.

• 智能技术 • 上一篇    下一篇

基于深度强化学习的方法求解带时间窗的旅行商问题

江明, 刘志威   

  1. 福建理工大学互联网经贸学院; 福建理工大学交通运输学院
  • 出版日期:2024-02-04 发布日期:2024-02-04
  • 作者简介:江明,男,博士,副教授,主要从事数字孪生、IT治理研究,E-mail:19842158@fjut.edu.cn;刘志威,男,硕士研究生,主要从事深度强化学习、智能交通研究,E-mail:2211801008@mail.fjut.edu.cn

Solving the traveling salesman problem with time window based on deep reinforcement learning

  • Online:2024-02-04 Published:2024-02-04

摘要: 带时间窗的旅行商问题(traveling salesman problem with time window,TSPTW)是旅行商问题的一个变种,在物资配送等方面有大量的应用。传统方法的求解时间较长且泛化性较差,为提高TSPTW的求解效率,将求解过程建模为马尔科夫决策过程,定义了状态、动作、奖励,提出了一种基于深度强化学习的Transformer加指针网络的组合模型,通过多头注意力对输入的特征进行编码,采用指针网络求出解的概率分布,所提深度学习网络通过强化学习算法进行训练。实验结果表明:所提方法对比传统的启发式求解算法,可以得到更高质量的解,相较于求解器和启发式算法,有超过数10倍的提升效果,且易于将模型拓展到不同规模的问题上

关键词: 带时间窗的旅行商, 深度强化学习, 组合优化, 注意力机制

Abstract: The Traveling Salesman Problem with Time Window (TSPTW), widely applied in material distribution, is a variant of the traveling salesman problem. To remedy such problems as long solution time and poor generalization of the traditional method as well as to to improve the solution efficiency of TSPTW, this paper models the solution process as a Markov decision process, defines the state, action and reward, and proposes a deep reinforcement learning based Transformer + pointer network model, which encodes the input features through multi-head attention, and employs the pointer network to work out the probability distribution of the solution. The deep learning network is trained by reinforcement learning algorithm. The experimental results show the proposed method obtains higher quality solutions compared with the traditional heuristic algorithms. Moreover, it markedly improves the final results and easily transfers the model to other problems of different scales compared with solvers and traditional heuristic algorithms.

中图分类号: 

  • TP18