《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (10): 3252-3258.DOI: 10.11772/j.issn.1001-9081.2021091582

• 前沿与综合应用 • 上一篇    

基于深度强化学习的电力物资配送多目标路径优化

徐郁1, 朱韵攸2, 刘筱3, 邓雨婷3, 廖勇4   

  1. 1.国网重庆市电力公司 永川供电分公司, 重庆 402160
    2.国网重庆市电力公司 信息通信分公司, 重庆 401120
    3.重庆锦禹云能源科技有限公司, 重庆 400050
    4.重庆大学 微电子与通信工程学院, 重庆 400044
  • 收稿日期:2021-09-07 修回日期:2021-11-11 接受日期:2021-11-17 发布日期:2022-04-15 出版日期:2022-10-10
  • 通讯作者: 廖勇
  • 作者简介:第一联系人:徐郁(1975—),男,重庆人,高级工程师,硕士,主要研究方向:电力系统及其自动化
    朱韵攸(1985—),女,重庆人,工程师,硕士,主要研究方向:通信与信息应用
    刘筱(1984—),女,云南昆明人,高级工程师,硕士,主要研究方向:通信与信息系统
    邓雨婷(1993—),女,重庆人,初级工程师,主要研究方向:地理信息系统
    廖勇(1982—),男,四川自贡人,副研究员,博士生导师,博士,CCF杰出会员,主要研究方向:下一代无线通信、人工智能。liaoy@cqu.edu.cn
  • 基金资助:
    国网重庆市电力公司科技项目(2021渝电科技8#)

Multi-objective routing optimization of electric power material distribution based on deep reinforcement learning

Yu XU1, Yunyou ZHU2, Xiao LIU3, Yuting DENG3, Yong LIAO4   

  1. 1.Yongchuan Power Supply Branch,State Grid Chongqing Electric Power Company,Chongqing 402160,China
    2.Information and Telecommunication Branch,State Grid Chongqing Electric Power Company,Chongqing 401120,China
    3.Chongqing Jinyuyun Energy Technology Company Limited,Chongqing 400050,China
    4.School of Microelectronics and Communication Engineering,Chongqing University,Chongqing 400044,China
  • Received:2021-09-07 Revised:2021-11-11 Accepted:2021-11-17 Online:2022-04-15 Published:2022-10-10
  • Contact: Yong LIAO
  • About author:XU Yu, born in 1975, M. S. , senior engineer. His research interests include power system and its automation.
    ZHU Yunyou, born in 1985, M. S. , engineer. Her research interests include information and communication application.
    LIU Xiao, born in 1984, M. S. , senior engineer. Her research interests include information and communication system.
    DENG Yuting, born in 1993, junior engineer. Her research interests include geographic information system.
    LIAO Yong, born in 1982, Ph. D. , research associate. His research interests include next generation wireless communication,artificial intelligence.
  • Supported by:
    Science and Technology Project of State Grid Chongqing Electric Power Company (2021 Yudian Technology 8#)

摘要:

针对现有电力物资车辆路径问题(EVRP)优化时考虑目标函数较为单一、约束不够全面,并且传统求解算法效率不高的问题,提出一种基于深度强化学习(DRL)的电力物资配送多目标路径优化模型和求解算法。首先,充分考虑了电力物资配送区域的加油站分布情况、物资运输车辆的油耗等约束,建立了以电力物资配送路径总长度最短、成本最低、物资需求点满意度最高为目标的多目标电力物资配送模型;其次,设计了一种基于DRL的电力物资配送路径优化算法DRL-EVRP求解所提模型。DRL-EVRP使用改进的指针网络(Ptr-Net)和Q-学习(Q-learning)算法结合的深度Q-网络(DQN)来将累积增量路径长度的负值与满意度之和作为奖励函数。所提算法在进行训练学习后,可直接用于电力物资配送路径规划。仿真实验结果表明,DRL-EVRP求解得到的电力物资配送路径总长度相较于扩展C-W(ECW)节约算法、模拟退火(SA)算法更短,且运算时间在可接受范围内,因此所提算法能更加高效、快速地进行电力物资配送路径优化。

关键词: 电力物资, 多目标路径优化, 车辆路径问题, 深度强化学习, 指针网络

Abstract:

In the existing optimization of Electric power material Vehicle Routing Problem (EVRP), the objective function is relatively single, the constraints are not comprehensive enough, and the traditional solution algorithms are not efficient. Therefore, a multi-objective routing optimization model and solution algorithm for electric power material distribution based on Deep Reinforcement Learning (DRL) was proposed. Firstly, the electric power material distribution area constraints such as the distribution of gas stations and the fuel consumption of material transportation vehicles were fully considered to establish a multi-objective power material distribution model with the objectives of the minimum total length of the power material distribution routings, the lowest cost, and the highest material demand point satisfaction. Secondly, a power material distribution routing optimization algorithm DRL-EVRP was designed on the basis of Deep Reinforcement Learning (DRL) to solve the proposed model. In the algorithm, the improved Pointer Network (Ptr-Net) and the Q-learning algorithm were combined to form the Deep Q-Network (DQN), which was used to take the sum of the negative value of the cumulative incremental routing length and the satisfaction as the reward function. After DRL-EVRP algorithm was trained and learned, it can be directly used for the planning of electric power material distribution routings. Simulation results show that the total length of the power material distribution routing solved by DRL-EVRP algorithm is shorter than those solved by the Extended Clarke and Wright (ECW) saving algorithm and Simulated Annealing (SA) algorithm, and the calculation time of the proposed algorithm is within an acceptable range. Therefore, the power material distribution routing can be optimized more efficiently and quickly by the proposed algorithm.

Key words: electric power material, multi-objective routing optimization, Vehicle Routing Problem (VRP), Deep Reinforcement Learning (DRL), Pointer Network (Ptr-Net)

中图分类号: