深度强化学习解决动态旅行商问题

doi:10.11772/j.issn.1001-9081.2021071253

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1194-1200.DOI: 10.11772/j.issn.1001-9081.2021071253

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇

深度强化学习解决动态旅行商问题

陈浩杰, 范江亭, 刘勇()

黑龙江大学计算机科学与技术学院，哈尔滨 150006

收稿日期:2021-07-16 修回日期:2021-10-05 接受日期:2021-10-09 发布日期:2021-10-05 出版日期:2022-04-10
通讯作者: 刘勇
作者简介:陈浩杰（1995—），女，山东青岛人，硕士研究生，CCF会员，主要研究方向：机器学习、人工智能、深度学习、强化学习
范江亭（1995—），男，黑龙江牡丹江人，硕士研究生，主要研究方向：机器学习、人工智能、强化学习、动态网络表征
基金资助:
黑龙江省自然科学基金资助项目(LH2020F043);黑龙江大学研究生创新科研项目(YJSCX2021-197HLJU)

Solving dynamic traveling salesman problem by deep reinforcement learning

Haojie CHEN, Jiangting FAN, Yong LIU()

College of Computer Science and Technology，Heilongjiang University，Harbin Heilongjiang 150006，China

Received:2021-07-16 Revised:2021-10-05 Accepted:2021-10-09 Online:2021-10-05 Published:2022-04-10
Contact: Yong LIU
About author:CHEN Haojie， born in 1995， M. S. candidate. Her research interests include machine learning， artificial intelligence， deep learning， reinforcement learning.
FAN Jiangting， born in 1995， M. S. candidate. His research interests include machine learning， artificial intelligence， reinforcement learning， dynamic network representation.
Supported by:
Natural Science Foundation of Heilongjiang Province(LH2020F043);Funding for Postgraduate Innovation Research Project of Heilongjiang University(YJSCX2021-197HLJU)

摘要/Abstract

摘要：

针对未设计启发式算法的组合优化问题设计统一的解决方案已成为机器学习领域的一个研究热点，目前成熟的技术主要针对静态的组合优化问题，但是对于加入动态变化的组合优化问题还没有得到充分的解决。为了解决以上问题，提出一个将多头注意力机制与分层强化学习结合来求解动态图上的旅行商问题的轻量级模型Dy4TSP。首先，用以多头注意力机制为基础的预测网络处理来自图卷积神经网络的节点表征向量输入；然后，借助分布式强化学习算法训练来快速地预估图中每个节点被输出作为最优解的可能性，使得模型在不同的可能性中全面探索问题的最优解决方案空间；最后，训练后的模型将实时地生成满足具体目标奖励函数的动作决策序列。该模型在3个组合优问题上进行了评估，实验结果表明，该模型在经典旅行商系列问题中解的质量比开源求解器LKH3高0.15~0.37个单位，明显优于带有边嵌入的图注意网络（EGATE）等最新的算法；并且在其他的动态旅行商问题中可以达到0.1~1.05的最优路径差距，结果也略胜一筹。

关键词: 组合优化问题, 机器学习, 强化学习, 深度学习, 图卷积神经网络, 分布式学习, 多智能体

Abstract:

Designing a unified solution to the combinational optimization problems of undesigned heuristic algorithms has become a research hotspot in the field of machine learning. At present， mature technologies are mainly aiming at static combinatorial optimization problems， but the combinational optimization problems with dynamic changes are not fully solved. In order to solve above problems， a lightweight model called Dy4TSP （Dynamic model for Traveling Salesman Problems） was proposed， which combined multi-head-attention mechanism with distributed reinforcement learning to solve the traveling salesman problem on a dynamic graph. Firstly， the node representation vector from graph convolution neural network was processed by the prediction network based on multi-head-attention mechanism. Then， the distributed reinforcement learning algorithm was used to quickly predict the possibility that each node in the graph was output as the optimal solution， and the optimal solution space of the problems in different possibilities were comprehensively explored. Finally， the action decision sequence which could meet the specific reward function in real time was generated by the trained model. The model was evaluated on three typical combinatorial optimization problems， and the experimental results showed that the solution qualities of the proposed model are 0.15 to 0.37 units higher than those of the open source solver LKH3 （Lin-Kernighan-Helsgaun 3）， and are significantly better than those of the latest algorithms such as Graph Attention Network with Edge Embedding （EGATE）. The proposed model can reach an optimal path gap of 0.1 to 1.05 in other dynamic traveling salesman problems， and the results are slightly better.

Key words: combinatorial optimization problem, machine learning, reinforcement learning, deep learning, graph convolutional neural network, distributed learning, multi-agent

中图分类号:

TP311

陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 计算机应用, 2022, 42(4): 1194-1200.

Haojie CHEN, Jiangting FAN, Yong LIU. Solving dynamic traveling salesman problem by deep reinforcement learning[J]. Journal of Computer Applications, 2022, 42(4): 1194-1200.

图/表 5

图1 神经网络构建框架

Fig. 1 Neural network construction framework

图2 多头注意力机制

Fig. 2 Multi-head attention mechanism

图3 Softmax函数网络结构

Fig. 3 Softmax function network structure

图4 不同模型的最优路径差距和最优路径长度比较

Fig. 4 Comparison of optimal path gap and optimal path length between different models

图5 不同节点数目训练损失值比较

Fig. 5 Comparision of training loss values for different numbers of nodes

参考文献 25

1	AUSIELLO G， CRESCENZI P， GAMBBOSI G， et al. Complexity and Approximation： Combinatorial Optimization Problems and Their Approximability Properties［M］. Berlin： Springer， 1999： 21-37. 10.1007/978-3-642-58412-1_1
2	HOCHBA D S. Approximation algorithms for NP-hard problems ［J］. ACM SIGACT News， 1997， 28（2）：40-52. 10.1145/261342.571216
3	SUTTON R， BARTO A G. Reinforcement learning ［J］. Neural Systems for Control， 1998， 15（7）：665-685.
4	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：6000-6010. 10.1016/s0262-4079(17)32358-8
5	LIANG E， LIAW R， NISHIHARA R， et al. RLlib： abstractions for distributed reinforcement learning ［C］// Proceedings of the 35th International Conference on Machine Learning. New York： PMLR.org， 2018：3053-3062.
6	KHALIL E， DAI H J， ZHANG Y Y， et al. Learning combinatorial optimization algorithms over graphs ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017：6351-6361.
7	VINYALS O， FORTUNATO M， JAITLY N. Pointer networks ［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015：2692-2700.
8	BELLO I， PHAM H， LE Q V， et al. Neural combinatorial optimization with reinforcement learning ［C］// Proceedings of the 2017 International Conference on Learning Representations. Waterloo： University of Waterloo， 2017：1-15.
9	SUTSKEVER I， VINYALS O， LE Q V. Sequence to sequence learning with neural networks ［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014：3104-3112.
10	HAUSKNECHT M， STONE P. Deep recurrent Q-learning for partially observable MDPs ［C］// Proceedings of the 2015 International Conference on the Association for the Advance of Artificial Intelligence. Palo Alto： AAAI Press， 2015：29-37.
11	KULKARNI T D， NARASIMHAN K R， SARRDI A， et al. Hierarchical deep reinforcement learning： integrating temporal abstraction and intrinsic motivation［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY ： Curran Associates， Inc.， 2016：1-9.
12	NAZARI M， OROOJLOOY A， SNYDER L V， et al. Reinforcement learning for solving the vehicle routing problem ［C］// Proceedings of the 2017 International Conference on Advances in Neural Information Processing Systems. Red Hook， NY： Curran Associates， Inc.. 2018：9860-9870.
13	GAO L， CHEN M X， CHEN Q C， et al. Learn to design the heuristics for vehicle routing problem ［EB/OL］.［2020-02-20］ . 10.48550/arXiv.2002.08539
14	张健，潘耀宗，杨海涛，等.基于蒙特卡洛Q值函数的多智能体决策方法［J］.控制与决策，2020，35（3）： 637-644. 10.13195/j.kzyjc.2018.0796
	ZHANG J， PAN Y Z， YANG H T， et al. Multi-agent decision-making method based on Monte Carlo Q-value function ［J］. Control and Decision， 2020， 35（3）：637-644. 10.13195/j.kzyjc.2018.0796
15	SALAKHUTDINOV R， HINTON G. Replicated softmax： an undirected topic model ［C］// Proceedings of the 2009 International Conference on Advances in Neural Information Processing Systems. Red Hook， NY： Curran Associates， Inc.， 2009：1-8. 10.1016/j.ijar.2008.11.006
16	TRIVEDI R， DAI H J， WANG Y C， et al. Know-evolve： deep temporal reasoning for dynamic knowledge graphs ［C］// Proceedings of the 34th International Conference on International Conference on Machine Learning. New York： JMLR.org. 2017：3462-3471.
17	ZAREMBA W， SUTSKEVER I， VINYALS O. Recurrent neural network regularization ［EB/OL］. ［2015-02-19］. . 10.3115/v1/p15-1002
18	HUANG G， LI Y X， PLEISS G， et al. Snapshot ensembles： train 1， get m for free ［EB/OL］. ［2017-04-01］. .
19	GLOROT X， BORDES A， BENGIO Y. Deep sparse rectifier neural networks ［J］. Journal of Machine Learning Research， 2011， 15：315-323.
20	赵长鲜，方木云.基于贪心算法的物流配送系统的设计与实现［J］.软件工程，2020，23（5）：21-23.
	ZHAO C X， FANG M Y. Design and implementation of logistics distribution system based on greedy algorithm［J］. Software Engineer， 2020， 23（5）：21-23.
21	GLOROT X， BENGIO Y. Understanding the difficulty of training deep feedforward neural networks ［C］// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. New York： PMLR.org， 2010：249-256.
22	KINGMA D P， BA J. Adam： a method for stochastic optimization ［C］// Proceedings of the 2015 International Conference for Learning Representations. Irvine， CA： Universal Publishers， Inc.， 2015：1-15.
23	NIKELSHPUR D， TAPPERT C C. Using particle swarm optimization to pre-train artificial neural networks： selecting initial training weights for feed-forward back-propagation neural networks ［C］// Proceedings of the 2013 International Conference on Student-Faculty Research Day， Pace University. New York： Pace University， 2013：C5.1-C5.7.
24	KOOL W， HOOF V H， Attention WELLING M.， learn to solve routing problems！［C］// Proceedings of the 7th International Conference on International Conference on Learning Representations. London： Publications of HSE， 2019：1-25.
25	HELSGAUN K. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems： technical report ［R］. Roskilde： Roskilde University. 2017：1-60.

[1]	汪祖民, 张志豪, 秦静, 季长清. 基于卷积神经网络的机械故障诊断技术综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1036-1043.
[2]	王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020.
[3]	董永峰, 邓亚晗, 董瑶, 王雅琮. 基于深度学习的聚类综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1021-1028.
[4]	季长清, 高志勇, 秦静, 汪祖民. 基于卷积神经网络的图像分类算法综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1044-1049.
[5]	危德健, 王文明, 王全玉, 任好盼, 高彦彦, 王志. 改进的基于锚点的三维手部姿态估计网络[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 953-959.
[6]	陈露, 张晓霞, 于洪. 基于先验知识的非负矩阵半可解释三因子分解算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 671-675.
[7]	李晓杰, 崔超然, 宋广乐, 苏雅茜, 吴天泽, 张春云. 基于时序超图卷积神经网络的股票趋势预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 797-803.
[8]	于婉莹, 梁美玉, 王笑笑, 陈徵, 曹晓雯. 基于深度注意力网络的课堂教学视频中学生表情识别与智能教学评估[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 743-749.
[9]	李讷, 徐光柱, 雷帮军, 马国亮, 石勇涛. 交通道路行驶车辆车标识别算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 810-817.
[10]	许仁杰, 刘宝弟, 张凯, 刘伟锋. 基于贝叶斯权函数的模型无关元学习算法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 708-712.
[11]	陈亭秀, 尹建芹. 基于关键帧筛选网络的视听联合动作识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 731-735.
[12]	孙邱杰, 梁景贵, 李思. 基于BART噪声器的中文语法纠错模型[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 860-866.
[13]	刘海杨, 孟令航, 林仲航, 谷源涛. 基于轨迹点聚类的航路发现方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 890-894.
[14]	谢鑫, 张贤勇, 王旋晔, 唐鹏飞. 变精度邻域等价粒的邻域决策树构造算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 382-388.
[15]	李俊伯, 秦品乐, 曾建潮, 李萌. 基于超分辨率网络的CT三维重建算法[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 584-591.

深度强化学习解决动态旅行商问题

Solving dynamic traveling salesman problem by deep reinforcement learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 25

相关文章 15

编辑推荐

Metrics