Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 4055-4063.DOI: 10.11772/j.issn.1001-9081.2024111670

• Frontier and comprehensive applications • Previous Articles     Next Articles

Task allocation of unmanned aerial vehicle for rural last-mile delivery based on reinforcement learning

Xiaojuan CHEN, Wei ZHANG   

  1. College of Information and Communication Engineering,Harbin Engineering University,Harbin Heilongjiang 150001,China
  • Received:2024-11-29 Revised:2025-03-27 Accepted:2025-04-08 Online:2025-04-22 Published:2025-12-10
  • Contact: Wei ZHANG
  • About author:CHEN Xiaojuan, born in 2000, M. S. candidate. Her research interests include UAV logistics delivery, task allocation.
    ZHANG Wei, born in 1972, Ph. D., associate professor. Her research interests include machine learning, artificial intelligence.
  • Supported by:
    State Key Laboratory of Complex Electromagnetic Environmental Effects on Electronics and Information Systems(CEMEE2021K0101A)

基于强化学习的无人机乡村末端配送任务分配

陈晓娟, 张薇   

  1. 哈尔滨工程大学 信息与通信工程学院,哈尔滨 150001
  • 通讯作者: 张薇
  • 作者简介:陈晓娟(2000—),女,山东泰安人,硕士研究生,CCF会员,主要研究方向:无人机物流配送、任务分配
    张薇(1972—),女,山东嘉祥人,副教授,博士,主要研究方向:机器学习、人工智能。
  • 基金资助:
    电子信息系统复杂电磁环境效应国家重点实验室资助课题(CEMEE2021K0101A)

Abstract:

The difficulty, long delivery time, and high cost of last-mile delivery in rural areas make efficient and accurate last-mile delivery scheduling solutions particularly important. Aiming at the task allocation problem of multiple logistics Unmanned Aerial Vehicles (UAVs) in rural distribution scenarios, a multi-objective UAV task allocation model was established by considering the payload capacity of UAVs and the maximum flight distance of UAVs comprehensively, with the goal of minimizing the flight distance, dispatched quantity of UAVs and not violating time windows. Firstly, based on reinforcement learning, to address the problem of high dimensionality in task allocation, an encoder and attention mechanism were introduced to simplify the state space effectively. Secondly, the global-local search strategy was combined to explore the solution space while avoiding getting stuck in the local optimum, thereby improving the quality of the solution. Finally, further analysis was conducted on the parameter weight settings, and the optimal combination of weight coefficients for sub-objective functions was obtained through experiments. Simulation results show that compared to the Hybrid Q-learning network based Method (HQM), Adaptive Large Neighborhood Search algorithm (ALNS), Q-learning algorithm (Q-learning), and Genetic Algorithm (GA) in terms of the obtained final path length, the proposed algorithm SG-HQM (Sine and Gaussian HQM) reduced it by 8.35%, 9.88%, 10.29%, and 12.48%, respectively.

Key words: last-mile delivery, task allocation, reinforcement learning, Unmanned Aerial Vehicle (UAV), multi-objective optimization

摘要:

农村“最后一公里”配送难、时间长和成本高的特点使高效精准的末端配送调度方案尤为重要。针对农村配送场景下的多物流无人机(UAV)的任务分配问题,综合考虑UAV的载重量和UAV的最大飞行距离,以最小化UAV的飞行距离、派遣数量和不违反时间窗为目标,建立多目标的UAV任务分配模型。首先,以强化学习为基础,针对任务分配维数过高的问题,引入编码器和注意力机制有效简化状态空间;其次,结合全局-局部搜索策略,在探索解空间的同时避免陷入局部最优解,从而提高求解质量;最后,进一步分析参数权重设置,并且经实验得出各子目标函数权重系数的最优组合。仿真结果表明,在得到的最终路径长度上相较于混合Q学习网络方法(HQM)、自适应大邻域搜索算法(ALNS)、Q学习算法(Q-learning)和遗传算法(GA),所提算法SG-HQM(Sine and Gaussian HQM)分别减少了8.35%、9.88%、10.29%和12.48%。

关键词: 末端配送, 任务分配, 强化学习, 无人机, 多目标优化

CLC Number: