《计算机应用》唯一官方网站

• •    下一篇

基于强化学习的无人机乡村末端配送任务分配

陈晓娟,张薇   

  1. 哈尔滨工程大学
  • 收稿日期:2024-11-26 修回日期:2025-03-27 发布日期:2025-04-22 出版日期:2025-04-22
  • 通讯作者: 张薇
  • 基金资助:
    电子信息系统复杂电磁环境效应国家重点实验室资助课题

Task allocation of unmanned aerial vehicle for rural end delivery based on reinforcement learning

  • Received:2024-11-26 Revised:2025-03-27 Online:2025-04-22 Published:2025-04-22

摘要: 农村最后一公里配送难、时间长、成本高的特点使得高效精准的末端配送调度方案显得尤为重要。针对农村配送场景下的多物流无人机的任务分配问题,综合考虑无人机的载重量和无人机的最大飞行距离,以最小化无人机的飞行距离、派遣数量及不违反时间窗为目标,建立多目标的无人机任务分配模型。以强化学习为基础,针对任务分配问题维数过高的困境,引入编码器及注意力机制,有效简化状态空间。结合全局-局部搜索策略,在探索解空间的同时避免陷入局部最优解,从而提高求解质量。最后进一步对参数权重设置进行分析,经实验得出各子目标函数权重系数的最优组合。仿真结果表明,所提出算法得到的最终路径长度与混合Q学习网络方法(HQM)、自适应大邻域搜索算法(ALNS)、Q学习算法(Q-learning)及遗传算法(GA)相比,分别减少了8.35%、9.88%、10.29%、12.48%。

关键词: 末端配送, 任务分配, 强化学习, 无人机, 多目标优化

Abstract: Abstract: The difficulty, long delivery time, and high cost of last mile delivery in rural areas make efficient and accurate end-to-end delivery scheduling solutions particularly important. Aiming at the task allocation problem of multiple logistics drones in rural distribution scenarios, a multi-objective drone task allocation model is established by comprehensively considering the payload capacity of drones and the maximum flight distance of drones, with the goal of minimizing the flight distance, dispatch quantity, and not violating time windows of drones. Based on reinforcement learning, to address the problem of high dimensionality in task allocation, an encoder and attention mechanism are introduced to effectively simplify the state space. Combining the global local search strategy to explore the solution space while avoiding getting stuck in local optima, thereby improving the quality of the solution. Finally, further analysis was conducted on the parameter weight settings, and the optimal combination of weight coefficients for each sub objective function was obtained through experiments. The simulation results show that the final path length obtained by the proposed algorithm is reduced by 8.35%, 9.88%, 10.29%, and 12.48% compared to the hybrid Q-learning network method (HQM), adaptive large neighborhood search algorithm (ALNS), Q-learning algorithm (Q-learning), and genetic algorithm (GA), respectively.

Key words: End of line delivery, Task allocation, Reinforcement learning, unmanned aerial vehicle(UAV), Multi objective optimization

中图分类号: