Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning

doi:10.11772/j.issn.1001-9081.2021091582

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (10): 3252-3258.DOI: 10.11772/j.issn.1001-9081.2021091582

• Frontier and comprehensive applications • Previous Articles

Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning

Yu XU¹, Yunyou ZHU², Xiao LIU³, Yuting DENG³, Yong LIAO⁴

^1.Yongchuan Power Supply Branch，State Grid Chongqing Electric Power Company，Chongqing 402160，China
^2.Information and Telecommunication Branch，State Grid Chongqing Electric Power Company，Chongqing 401120，China
^3.Chongqing Jinyuyun Energy Technology Company Limited，Chongqing 400050，China
^4.School of Microelectronics and Communication Engineering，Chongqing University，Chongqing 400044，China

Received:2021-09-07 Revised:2021-11-11 Accepted:2021-11-17 Online:2022-04-15 Published:2022-10-10
Contact: Yong LIAO
About author:XU Yu, born in 1975， M. S. ， senior engineer. His research interests include power system and its automation.
ZHU Yunyou, born in 1985， M. S. ， engineer. Her research interests include information and communication application.
LIU Xiao, born in 1984， M. S. ， senior engineer. Her research interests include information and communication system.
DENG Yuting, born in 1993， junior engineer. Her research interests include geographic information system.
LIAO Yong, born in 1982， Ph. D. ， research associate. His research interests include next generation wireless communication，artificial intelligence.
Supported by:
Science and Technology Project of State Grid Chongqing Electric Power Company （2021 Yudian Technology 8#）

基于深度强化学习的电力物资配送多目标路径优化

徐郁¹, 朱韵攸², 刘筱³, 邓雨婷³, 廖勇⁴

^1.国网重庆市电力公司永川供电分公司, 重庆 402160
^2.国网重庆市电力公司信息通信分公司, 重庆 401120
^3.重庆锦禹云能源科技有限公司, 重庆 400050
^4.重庆大学微电子与通信工程学院, 重庆 400044

通讯作者: 廖勇
作者简介:第一联系人：徐郁（1975—），男，重庆人，高级工程师，硕士，主要研究方向：电力系统及其自动化
朱韵攸（1985—），女，重庆人，工程师，硕士，主要研究方向：通信与信息应用
刘筱（1984—），女，云南昆明人，高级工程师，硕士，主要研究方向：通信与信息系统
邓雨婷（1993—），女，重庆人，初级工程师，主要研究方向：地理信息系统
廖勇（1982—），男，四川自贡人，副研究员，博士生导师，博士，CCF杰出会员，主要研究方向：下一代无线通信、人工智能。liaoy@cqu.edu.cn
基金资助:
国网重庆市电力公司科技项目（2021渝电科技8#）

Abstract

Abstract:

In the existing optimization of Electric power material Vehicle Routing Problem （EVRP）， the objective function is relatively single， the constraints are not comprehensive enough， and the traditional solution algorithms are not efficient. Therefore， a multi-objective routing optimization model and solution algorithm for electric power material distribution based on Deep Reinforcement Learning （DRL） was proposed. Firstly， the electric power material distribution area constraints such as the distribution of gas stations and the fuel consumption of material transportation vehicles were fully considered to establish a multi-objective power material distribution model with the objectives of the minimum total length of the power material distribution routings， the lowest cost， and the highest material demand point satisfaction. Secondly， a power material distribution routing optimization algorithm DRL-EVRP was designed on the basis of Deep Reinforcement Learning （DRL） to solve the proposed model. In the algorithm， the improved Pointer Network （Ptr-Net） and the Q-learning algorithm were combined to form the Deep Q-Network （DQN）， which was used to take the sum of the negative value of the cumulative incremental routing length and the satisfaction as the reward function. After DRL-EVRP algorithm was trained and learned， it can be directly used for the planning of electric power material distribution routings. Simulation results show that the total length of the power material distribution routing solved by DRL-EVRP algorithm is shorter than those solved by the Extended Clarke and Wright （ECW） saving algorithm and Simulated Annealing （SA） algorithm， and the calculation time of the proposed algorithm is within an acceptable range. Therefore， the power material distribution routing can be optimized more efficiently and quickly by the proposed algorithm.

Key words: electric power material, multi-objective routing optimization, Vehicle Routing Problem (VRP), Deep Reinforcement Learning (DRL), Pointer Network (Ptr-Net)

摘要：

针对现有电力物资车辆路径问题（EVRP）优化时考虑目标函数较为单一、约束不够全面，并且传统求解算法效率不高的问题，提出一种基于深度强化学习（DRL）的电力物资配送多目标路径优化模型和求解算法。首先，充分考虑了电力物资配送区域的加油站分布情况、物资运输车辆的油耗等约束，建立了以电力物资配送路径总长度最短、成本最低、物资需求点满意度最高为目标的多目标电力物资配送模型；其次，设计了一种基于DRL的电力物资配送路径优化算法DRL-EVRP求解所提模型。DRL-EVRP使用改进的指针网络（Ptr-Net）和Q-学习（Q-learning）算法结合的深度Q-网络（DQN）来将累积增量路径长度的负值与满意度之和作为奖励函数。所提算法在进行训练学习后，可直接用于电力物资配送路径规划。仿真实验结果表明，DRL-EVRP求解得到的电力物资配送路径总长度相较于扩展C-W（ECW）节约算法、模拟退火（SA）算法更短，且运算时间在可接受范围内，因此所提算法能更加高效、快速地进行电力物资配送路径优化。

关键词: 电力物资, 多目标路径优化, 车辆路径问题, 深度强化学习, 指针网络

CLC Number:

TP301.6

Yu XU, Yunyou ZHU, Xiao LIU, Yuting DENG, Yong LIAO. Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning[J]. Journal of Computer Applications, 2022, 42(10): 3252-3258.

徐郁, 朱韵攸, 刘筱, 邓雨婷, 廖勇. 基于深度强化学习的电力物资配送多目标路径优化[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3252-3258.

Figures/Tables 10

Fig. 1 Electronic power material distribution network

Fig. 2 Schematic diagram of improved Ptr-Net

Fig. 3 DQN structure

Fig. 4 Structure of electronic power material distribution routing optimization based on DQN

Tab. 1 Network training parameter setting

参数名称	值	参数名称	值
迭代次数	20 000	学习率 $α$	0.000 5
一个迭代内步骤数	500	批量大小	32
记忆库大小	1 000 000	奖励折扣因子 $γ$	0.9
目标网络参数更新频率 $c$	5（迭代数）	优化器	ADAM
隐含层激活函数	ReLU

Tab. 1 Network training parameter setting

参数名称	值	参数名称	值
迭代次数	20 000	学习率 $α$	0.000 5
一个迭代内步骤数	500	批量大小	32
记忆库大小	1 000 000	奖励折扣因子 $γ$	0.9
目标网络参数更新频率 $c$	5（迭代数）	优化器	ADAM
隐含层激活函数	ReLU

Fig. 5 Change curve of training loss value

Fig. 6 Change curve of reward value

Tab. 2 Comparison of solution results of different algorithms for different cases

案例	U_Best		SA			ECW			DRL-EVRP
案例	f/km	H_i	f/km	Gap/%	H_i	f/km	Gap/%	H_i	f/km	Gap/%	H_i
平均值	1 617.11	18.31	1 657.07	2.79	17.71	1 798.65	11.18	17.42	1 623.60	0.40	18.49
EVRP1	1 791.49	18.86	1 807.08	0.87	18.56	2 099.09	17.17	18.56	1 774.02	-0.98	18.76
EVRP2	1 574.78	18.56	1 577.30	0.16	18.46	1 678.09	6.56	17.56	1 615.55	2.59	19.06
EVRP3	1 704.48	19.40	1 709.59	0.30	17.51	1 969.70	15.56	17.41	1 631.42	-4.29	18.40
EVRP4	1 482.00	19.12	1 599.82	7.95	16.93	1 708.45	15.28	18.13	1 468.11	-0.94	17.92
EVRP5	1 689.37	17.44	1 716.57	1.61	17.84	1 938.72	14.76	16.84	1 720.43	1.84	17.74
EVRP6	1 618.65	17.11	1 649.40	1.90	17.91	1 713.34	5.85	16.91	1 677.83	3.66	18.93
EVRP7	1 713.66	18.34	1 730.45	0.98	17.88	1 756.67	2.51	17.54	1 666.95	-2.73	18.84
EVRP8	1 706.50	18.09	1 758.72	3.06	17.39	2 094.56	22.74	17.09	1 771.12	3.79	18.59
EVRP9	1 708.82	18.64	1 712.24	0.20	17.64	1 718.39	0.56	16.64	1 734.24	1.49	18.67
EVRP10	1 181.31	17.49	1 309.48	10.85	16.94	1 309.48	10.85	17.49	1 176.34	-0.42	17.94

Tab. 3 Comparison of running time of different algorithms for different cases

案例	DRL-EVRP	SA	ECW	案例	DRL-EVRP	SA	ECW
EVRP2	0.118	0.112	0.104	EVRP8	0.168	0.157	0.115
EVRP4	0.125	0.128	0.116	EVRP10	0.143	0.146	0.139
EVRP6	0.102	0.123	0.105	平均值	0.131	0.133	0.121

Tab. 4 Location information of each point in case EVRP1

序号	经度	纬度	序号	经度	纬度	序号	经度	纬度
0	77.494 3	37.608 5	8	77.086 4	36.572 0	16	78.227 5	37.242 5
1	76.335 7	36.796 0	9	78.955 7	36.762 8	17	78.443 6	36.832 3
2	77.088 6	39.457 9	10	76.337 3	39.004 5	18	76.989 9	38.876 9
3	79.156 0	37.033 6	11	78.127 9	36.480 2	19	77.414 3	36.475 0
4	76.849 5	39.033 7	12	76.144 9	36.463 6	20	77.170 8	37.068 9
5	76.058 2	37.185 0	13	76.002 8	38.776 6	21	77.322 9	37.068 8
6	77.709 4	38.216 0	14	78.857 9	38.395 7	22	79.440 6	37.613 2
7	78.277 9	38.631 8	15	76.594 4	37.972 0	23	76.983 2	36.432 6

References 20

1	王海港，刘路登，张炜，等. 参与电力系统恢复的风电优化调度模型与策略［J］. 重庆理工大学学报（自然科学）， 2020， 34（12）：218-225.
	WANG H G， LIU L D， ZHANG W， et al. Wind power optimal dispatch model and strategy for power system restoration［J］. Journal of Chongqing University of Technology （Natural Science）， 2020， 34（12）： 218-225.
2	SUTTON R S， BARTO A G. Reinforcement Learning： An Introduction［M］. 2nd ed. Cambridge： MIT Press， 2018： 2-9.
3	LEI L， TAN Y， ZHENG K， et al. Deep reinforcement learning for autonomous internet of things： model， applications and challenges［J］. IEEE Communications Surveys and Tutorials， 2020， 22（3）： 1722-1760. 10.1109/comst.2020.2988367
4	ARULKUMARAN K， DEISENROTH M P， BRUNDAGE M， et al. Deep reinforcement learning： a brief survey［J］. IEEE Signal Processing Magazine， 2017， 34（6）： 26-38. 10.1109/msp.2017.2743240
5	KOÇ Ç， KARAOGLAN I. The green vehicle routing problem： a heuristic based exact solution approach［J］. Applied Soft Computing， 2016， 39： 154-164. 10.1016/j.asoc.2015.10.064
6	尹庆，钟雯，胡文，等. 基于CW节约算法下的乳品配送优化研究［J］. 现代商业， 2020（11）： 22-24.
	YIN Q， ZHONG W， HU W， et al. Research on dairy product distribution optimization based on CW saving algorithm［J］. Modern Business， 2020（11）： 22-24.
7	吴哲，徐圣伦，杨春梅，等. 切割路径优化问题的自适应大邻域搜索退火算法［J］. 重庆理工大学学报（自然科学）， 2020， 34（9）： 230-237， 244. 10.3969/j.issn.1674-8425(z).2020.09.027
	WU Z， XU S L， YANG C M， et al. An adaptive large neighborhood search-simulated annealing algorithm for cutting path optimization［J］. Journal of Chongqing University of Technology （Natural Science）， 2020， 34（9）： 230-237， 244. 10.3969/j.issn.1674-8425(z).2020.09.027
8	李进，王凤，杨沈宇. 换电模式下电动车货运路径优化模型与算法［J］. 计算机应用， 2021， 41（6）： 1792-1798.
	LI J， WANG F， YANG S Y. Freight routing optimization model and algorithm of battery-swapping electric vehicle［J］. Journal of Computer Applications， 2021， 41（6）： 1792-1798.
9	ZHOU S Y， LIU X， XU Y F， et al. A Deep Q-Network （DQN） based path planning method for mobile robots［C］// Proceedings of the 2018 IEEE International Conference on Information and Automation. Piscataway： IEEE， 2018： 366-371. 10.1109/icinfa.2018.8812452
10	LI J W， XIN L， CAO Z G， et al. Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（3）：2306-2315. 10.1109/tits.2021.3056120
11	陈国勇. 电网物流配送优化模型构建及仿真研究［J］. 自动化技术与应用， 2018， 37（12）：137-140. 10.3969/j.issn.1003-7241.2018.12.031
	CHEN G Y. Optimization model construction and simulation research of power grid logistics distribution system［J］. Techniques of Automation and Applications， 2018， 37（12）：137-140. 10.3969/j.issn.1003-7241.2018.12.031
12	田立燚. 电网公司电力应急物资配送网络优化研究［D］. 北京：华北电力大学， 2020：10-21.
	TIAN L Y. Research on optimization of emergency supplies distribution network for power grid companies［D］. Beijing： North China Electric Power University， 2020：10-21.
13	ERDOĞAN S， MILLER-HOOKS E. A green vehicle routing problem［J］. Transportation Research Part E： Logistics and Transportation Review， 2012， 48（1）： 100-114. 10.1016/j.tre.2011.08.001
14	NAZARI M， OROOJLOOY A， TAKÁČ M， et al. Reinforcement learning for solving the vehicle routing problem［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2018： 9861-9871. 10.1109/aidas47888.2019.8970890
15	MANCHELLA K， UMRAWAL A K， AGGARWAL V. FlexPool： a distributed model-free deep reinforcement learning algorithm for joint passengers and goods transportation［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（4）： 2035-2047. 10.1109/tits.2020.3048361
16	LI Y X. Deep reinforcement learning： an overview［EB/OL］. （2018-11-26）［2021-08-20］..
17	YU J J Q， YU W， GU J T. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning［J］. IEEE Transactions on Intelligent Transportation Systems， 2019， 20（10）： 3806-3817. 10.1109/tits.2019.2909109
18	AHAMED T， ZOU B， FARAZI N P， et al. Deep reinforcement learning for crowdsourced urban delivery： system states characterization， heuristics-guided action choice， and rule-interposing integration［EB/OL］. （2020-11-29）［2021-08-20］.. 10.1016/j.trb.2021.08.015
19	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. ［2021-08-20］..
20	VERA J M， ABAD A G. Deep reinforcement learning for routing a heterogeneous fleet of vehicles［C］// Proceedings of the 2019 IEEE Latin American Conference on Computational Intelligence. Piscataway： IEEE， 2019： 1-6. 10.1109/la-cci47412.2019.9037042

[1]	Zhihao XIAO, Zhihua HU, Lin ZHU. Hybrid adaptive large neighborhood search algorithm for solving time-dependent vehicle routing problem in cold chain logistics [J]. Journal of Computer Applications, 2022, 42(9): 2926-2935.
[2]	Zhishuo LIU, Ruosi LIU, Zhe CHEN. Cold chain electric vehicle routing problem based on hybrid ant colony optimization [J]. Journal of Computer Applications, 2022, 42(10): 3244-3251.
[3]	ZHANG Yuzhou, XU Tingzheng, ZHENG Junshuai, RAO Shun. Modeling and optimization of disaster relief vehicle routing problem considering urgency [J]. Journal of Computer Applications, 2019, 39(8): 2444-2449.
[4]	SHI Jianli, ZHANG Jin. Model and algorithm for split delivery vehicle routing problem with stochastic travel time [J]. Journal of Computer Applications, 2018, 38(2): 573-581.
[5]	YANG Wang, HE Guochao, WU Yan. Density clustering based removal heuristic for vehicle routing problem [J]. Journal of Computer Applications, 2017, 37(8): 2387-2394.
[6]	YIN Ya, ZHANG Huizhen. Improved hybrid bat algorithm for vehicle routing problem of perishable fresh goods [J]. Journal of Computer Applications, 2017, 37(12): 3602-3607.
[7]	Xiao-chong LIU Min DAI Gang ZHENG Qing-jun HUANG. Vehicle Route Planning Study for Cash Transport Van [J]. Journal of Computer Applications, 2011, 31(04): 1121-1124.

Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning

基于深度强化学习的电力物资配送多目标路径优化

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 20

Related Articles 7

Recommended Articles

Metrics