《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1194-1200.DOI: 10.11772/j.issn.1001-9081.2021071253

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇    

深度强化学习解决动态旅行商问题

陈浩杰, 范江亭, 刘勇()   

  1. 黑龙江大学 计算机科学与技术学院,哈尔滨 150006
  • 收稿日期:2021-07-16 修回日期:2021-10-05 接受日期:2021-10-09 发布日期:2021-10-05 出版日期:2022-04-10
  • 通讯作者: 刘勇
  • 作者简介:陈浩杰(1995—),女,山东青岛人,硕士研究生,CCF会员,主要研究方向:机器学习、人工智能、深度学习、强化学习
    范江亭(1995—),男,黑龙江牡丹江人,硕士研究生,主要研究方向:机器学习、人工智能、强化学习、动态网络表征
  • 基金资助:
    黑龙江省自然科学基金资助项目(LH2020F043);黑龙江大学研究生创新科研项目(YJSCX2021-197HLJU)

Solving dynamic traveling salesman problem by deep reinforcement learning

Haojie CHEN, Jiangting FAN, Yong LIU()   

  1. College of Computer Science and Technology,Heilongjiang University,Harbin Heilongjiang 150006,China
  • Received:2021-07-16 Revised:2021-10-05 Accepted:2021-10-09 Online:2021-10-05 Published:2022-04-10
  • Contact: Yong LIU
  • About author:CHEN Haojie, born in 1995, M. S. candidate. Her research interests include machine learning, artificial intelligence, deep learning, reinforcement learning.
    FAN Jiangting, born in 1995, M. S. candidate. His research interests include machine learning, artificial intelligence, reinforcement learning, dynamic network representation.
  • Supported by:
    Natural Science Foundation of Heilongjiang Province(LH2020F043);Funding for Postgraduate Innovation Research Project of Heilongjiang University(YJSCX2021-197HLJU)

摘要:

针对未设计启发式算法的组合优化问题设计统一的解决方案已成为机器学习领域的一个研究热点,目前成熟的技术主要针对静态的组合优化问题,但是对于加入动态变化的组合优化问题还没有得到充分的解决。为了解决以上问题,提出一个将多头注意力机制与分层强化学习结合来求解动态图上的旅行商问题的轻量级模型Dy4TSP。首先,用以多头注意力机制为基础的预测网络处理来自图卷积神经网络的节点表征向量输入;然后,借助分布式强化学习算法训练来快速地预估图中每个节点被输出作为最优解的可能性,使得模型在不同的可能性中全面探索问题的最优解决方案空间;最后,训练后的模型将实时地生成满足具体目标奖励函数的动作决策序列。该模型在3个组合优问题上进行了评估,实验结果表明,该模型在经典旅行商系列问题中解的质量比开源求解器LKH3高0.15~0.37个单位,明显优于带有边嵌入的图注意网络(EGATE)等最新的算法;并且在其他的动态旅行商问题中可以达到0.1~1.05的最优路径差距,结果也略胜一筹。

关键词: 组合优化问题, 机器学习, 强化学习, 深度学习, 图卷积神经网络, 分布式学习, 多智能体

Abstract:

Designing a unified solution to the combinational optimization problems of undesigned heuristic algorithms has become a research hotspot in the field of machine learning. At present, mature technologies are mainly aiming at static combinatorial optimization problems, but the combinational optimization problems with dynamic changes are not fully solved. In order to solve above problems, a lightweight model called Dy4TSP (Dynamic model for Traveling Salesman Problems) was proposed, which combined multi-head-attention mechanism with distributed reinforcement learning to solve the traveling salesman problem on a dynamic graph. Firstly, the node representation vector from graph convolution neural network was processed by the prediction network based on multi-head-attention mechanism. Then, the distributed reinforcement learning algorithm was used to quickly predict the possibility that each node in the graph was output as the optimal solution, and the optimal solution space of the problems in different possibilities were comprehensively explored. Finally, the action decision sequence which could meet the specific reward function in real time was generated by the trained model. The model was evaluated on three typical combinatorial optimization problems, and the experimental results showed that the solution qualities of the proposed model are 0.15 to 0.37 units higher than those of the open source solver LKH3 (Lin-Kernighan-Helsgaun 3), and are significantly better than those of the latest algorithms such as Graph Attention Network with Edge Embedding (EGATE). The proposed model can reach an optimal path gap of 0.1 to 1.05 in other dynamic traveling salesman problems, and the results are slightly better.

Key words: combinatorial optimization problem, machine learning, reinforcement learning, deep learning, graph convolutional neural network, distributed learning, multi-agent

中图分类号: