深度强化学习解决动态旅行商问题

doi:10.11772/j.issn.1001-9081.2021071253

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1194-1200.DOI: 10.11772/j.issn.1001-9081.2021071253

所属专题： CCF第36届中国计算机应用大会 (CCF NCCA 2021)

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇下一篇

深度强化学习解决动态旅行商问题

陈浩杰, 范江亭, 刘勇()

黑龙江大学计算机科学与技术学院，哈尔滨 150006

收稿日期:2021-07-16 修回日期:2021-10-05 接受日期:2021-10-09 发布日期:2021-10-05 出版日期:2022-04-10
通讯作者: 刘勇
作者简介:陈浩杰（1995—），女，山东青岛人，硕士研究生，CCF会员，主要研究方向：机器学习、人工智能、深度学习、强化学习
范江亭（1995—），男，黑龙江牡丹江人，硕士研究生，主要研究方向：机器学习、人工智能、强化学习、动态网络表征
基金资助:
黑龙江省自然科学基金资助项目(LH2020F043);黑龙江大学研究生创新科研项目(YJSCX2021-197HLJU)

Solving dynamic traveling salesman problem by deep reinforcement learning

Haojie CHEN, Jiangting FAN, Yong LIU()

College of Computer Science and Technology，Heilongjiang University，Harbin Heilongjiang 150006，China

Received:2021-07-16 Revised:2021-10-05 Accepted:2021-10-09 Online:2021-10-05 Published:2022-04-10
Contact: Yong LIU
About author:CHEN Haojie， born in 1995， M. S. candidate. Her research interests include machine learning， artificial intelligence， deep learning， reinforcement learning.
FAN Jiangting， born in 1995， M. S. candidate. His research interests include machine learning， artificial intelligence， reinforcement learning， dynamic network representation.
Supported by:
Natural Science Foundation of Heilongjiang Province(LH2020F043);Funding for Postgraduate Innovation Research Project of Heilongjiang University(YJSCX2021-197HLJU)

摘要/Abstract

摘要：

针对未设计启发式算法的组合优化问题设计统一的解决方案已成为机器学习领域的一个研究热点，目前成熟的技术主要针对静态的组合优化问题，但是对于加入动态变化的组合优化问题还没有得到充分的解决。为了解决以上问题，提出一个将多头注意力机制与分层强化学习结合来求解动态图上的旅行商问题的轻量级模型Dy4TSP。首先，用以多头注意力机制为基础的预测网络处理来自图卷积神经网络的节点表征向量输入；然后，借助分布式强化学习算法训练来快速地预估图中每个节点被输出作为最优解的可能性，使得模型在不同的可能性中全面探索问题的最优解决方案空间；最后，训练后的模型将实时地生成满足具体目标奖励函数的动作决策序列。该模型在3个组合优问题上进行了评估，实验结果表明，该模型在经典旅行商系列问题中解的质量比开源求解器LKH3高0.15~0.37个单位，明显优于带有边嵌入的图注意网络（EGATE）等最新的算法；并且在其他的动态旅行商问题中可以达到0.1~1.05的最优路径差距，结果也略胜一筹。

关键词: 组合优化问题, 机器学习, 强化学习, 深度学习, 图卷积神经网络, 分布式学习, 多智能体

Abstract:

Designing a unified solution to the combinational optimization problems of undesigned heuristic algorithms has become a research hotspot in the field of machine learning. At present， mature technologies are mainly aiming at static combinatorial optimization problems， but the combinational optimization problems with dynamic changes are not fully solved. In order to solve above problems， a lightweight model called Dy4TSP （Dynamic model for Traveling Salesman Problems） was proposed， which combined multi-head-attention mechanism with distributed reinforcement learning to solve the traveling salesman problem on a dynamic graph. Firstly， the node representation vector from graph convolution neural network was processed by the prediction network based on multi-head-attention mechanism. Then， the distributed reinforcement learning algorithm was used to quickly predict the possibility that each node in the graph was output as the optimal solution， and the optimal solution space of the problems in different possibilities were comprehensively explored. Finally， the action decision sequence which could meet the specific reward function in real time was generated by the trained model. The model was evaluated on three typical combinatorial optimization problems， and the experimental results showed that the solution qualities of the proposed model are 0.15 to 0.37 units higher than those of the open source solver LKH3 （Lin-Kernighan-Helsgaun 3）， and are significantly better than those of the latest algorithms such as Graph Attention Network with Edge Embedding （EGATE）. The proposed model can reach an optimal path gap of 0.1 to 1.05 in other dynamic traveling salesman problems， and the results are slightly better.

Key words: combinatorial optimization problem, machine learning, reinforcement learning, deep learning, graph convolutional neural network, distributed learning, multi-agent

中图分类号:

TP311

陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 计算机应用, 2022, 42(4): 1194-1200.

Haojie CHEN, Jiangting FAN, Yong LIU. Solving dynamic traveling salesman problem by deep reinforcement learning[J]. Journal of Computer Applications, 2022, 42(4): 1194-1200.

图/表 5

图1 神经网络构建框架

Fig. 1 Neural network construction framework

图2 多头注意力机制

Fig. 2 Multi-head attention mechanism

图3 Softmax函数网络结构

Fig. 3 Softmax function network structure

图4 不同模型的最优路径差距和最优路径长度比较

Fig. 4 Comparison of optimal path gap and optimal path length between different models

图5 不同节点数目训练损失值比较

Fig. 5 Comparision of training loss values for different numbers of nodes

参考文献 25

1	AUSIELLO G， CRESCENZI P， GAMBBOSI G， et al. Complexity and Approximation： Combinatorial Optimization Problems and Their Approximability Properties［M］. Berlin： Springer， 1999： 21-37. 10.1007/978-3-642-58412-1_1
2	HOCHBA D S. Approximation algorithms for NP-hard problems ［J］. ACM SIGACT News， 1997， 28（2）：40-52. 10.1145/261342.571216
3	SUTTON R， BARTO A G. Reinforcement learning ［J］. Neural Systems for Control， 1998， 15（7）：665-685.
4	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：6000-6010. 10.1016/s0262-4079(17)32358-8
5	LIANG E， LIAW R， NISHIHARA R， et al. RLlib： abstractions for distributed reinforcement learning ［C］// Proceedings of the 35th International Conference on Machine Learning. New York： PMLR.org， 2018：3053-3062.
6	KHALIL E， DAI H J， ZHANG Y Y， et al. Learning combinatorial optimization algorithms over graphs ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：6351-6361.
7	VINYALS O， FORTUNATO M， JAITLY N. Pointer networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015：2692-2700.
8	BELLO I， PHAM H， LE Q V， et al. Neural combinatorial optimization with reinforcement learning ［C］// Proceedings of the 2017 International Conference on Learning Representations. Waterloo： University of Waterloo， 2017：1-15.
9	SUTSKEVER I， VINYALS O， LE Q V. Sequence to sequence learning with neural networks ［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014：3104-3112.
10	HAUSKNECHT M， STONE P. Deep recurrent Q-learning for partially observable MDPs ［C］// Proceedings of the 2015 International Conference on the Association for the Advance of Artificial Intelligence. Palo Alto： AAAI Press， 2015：29-37.
11	KULKARNI T D， NARASIMHAN K R， SARRDI A， et al. Hierarchical deep reinforcement learning： integrating temporal abstraction and intrinsic motivation［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY ： Curran Associates， Inc.， 2016：1-9.
12	NAZARI M， OROOJLOOY A， SNYDER L V， et al. Reinforcement learning for solving the vehicle routing problem ［C］// Proceedings of the 2017 International Conference on Advances in Neural Information Processing Systems. Red Hook， NY： Curran Associates， Inc.. 2018：9860-9870.
13	GAO L， CHEN M X， CHEN Q C， et al. Learn to design the heuristics for vehicle routing problem ［EB/OL］.［2020-02-20］ . 10.48550/arXiv.2002.08539
14	张健，潘耀宗，杨海涛，等.基于蒙特卡洛Q值函数的多智能体决策方法［J］.控制与决策，2020，35（3）： 637-644. 10.13195/j.kzyjc.2018.0796
	ZHANG J， PAN Y Z， YANG H T， et al. Multi-agent decision-making method based on Monte Carlo Q-value function ［J］. Control and Decision， 2020， 35（3）：637-644. 10.13195/j.kzyjc.2018.0796
15	SALAKHUTDINOV R， HINTON G. Replicated softmax： an undirected topic model ［C］// Proceedings of the 2009 International Conference on Advances in Neural Information Processing Systems. Red Hook， NY： Curran Associates， Inc.， 2009：1-8. 10.1016/j.ijar.2008.11.006
16	TRIVEDI R， DAI H J， WANG Y C， et al. Know-evolve： deep temporal reasoning for dynamic knowledge graphs ［C］// Proceedings of the 34th International Conference on International Conference on Machine Learning. New York： JMLR.org. 2017：3462-3471.
17	ZAREMBA W， SUTSKEVER I， VINYALS O. Recurrent neural network regularization ［EB/OL］. ［2015-02-19］. . 10.3115/v1/p15-1002
18	HUANG G， LI Y X， PLEISS G， et al. Snapshot ensembles： train 1， get m for free ［EB/OL］. ［2017-04-01］. .
19	GLOROT X， BORDES A， BENGIO Y. Deep sparse rectifier neural networks ［J］. Journal of Machine Learning Research， 2011， 15：315-323.
20	赵长鲜，方木云.基于贪心算法的物流配送系统的设计与实现［J］.软件工程，2020，23（5）：21-23.
	ZHAO C X， FANG M Y. Design and implementation of logistics distribution system based on greedy algorithm［J］. Software Engineer， 2020， 23（5）：21-23.
21	GLOROT X， BENGIO Y. Understanding the difficulty of training deep feedforward neural networks ［C］// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. New York： PMLR.org， 2010：249-256.
22	KINGMA D P， BA J. Adam： a method for stochastic optimization ［C］// Proceedings of the 2015 International Conference for Learning Representations. Irvine， CA： Universal Publishers， Inc.， 2015：1-15.
23	NIKELSHPUR D， TAPPERT C C. Using particle swarm optimization to pre-train artificial neural networks： selecting initial training weights for feed-forward back-propagation neural networks ［C］// Proceedings of the 2013 International Conference on Student-Faculty Research Day， Pace University. New York： Pace University， 2013：C5.1-C5.7.
24	KOOL W， HOOF V H， Attention WELLING M.， learn to solve routing problems！［C］// Proceedings of the 7th International Conference on International Conference on Learning Representations. London： Publications of HSE， 2019：1-25.
25	HELSGAUN K. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems： technical report ［R］. Roskilde： Roskilde University. 2017：1-60.

[1]	肖海林, 黄天义, 代秋香, 张跃军, 张中山. 基于轨迹预测的安全强化学习自动变道决策方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2958-2963.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[4]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[5]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[6]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[7]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[8]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[9]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[10]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[11]	何浩东, 符浩, 王强, 周帅, 刘伟. 基于深度强化学习的多机器人路径跟随与编队[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2626-2633.
[12]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[13]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[14]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[15]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.

深度强化学习解决动态旅行商问题

Solving dynamic traveling salesman problem by deep reinforcement learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 25

相关文章 15

编辑推荐

Metrics