《计算机应用》唯一官方网站

• •    下一篇

异构多智能体强化学习驱动的无人机三维避障与边缘计算协同优化

陈冠良1,刘义1,余意2   

  1. 1. 广东工业大学
    2. 长沙市电子工业学校
  • 收稿日期:2025-08-20 修回日期:2025-11-30 发布日期:2026-02-12 出版日期:2026-02-12
  • 通讯作者: 刘义
  • 基金资助:
    6G全场景按需服务关键技术

Heterogeneous multi-agent reinforcement learning enabled co-optimization of UAV 3D obstacle avoidance and edge computing#br#

  • Received:2025-08-20 Revised:2025-11-30 Online:2026-02-12 Published:2026-02-12

摘要: 摘 要: 物联网与移动终端设备的激增,导致计算任务对网络时延与能耗的挑战。为此,本文研究了一种多无人机(UAV)辅助的移动边缘计算(MEC)系统,利用UAV为地面用户提供高效的计算卸载服务。在系统中,多架UAV在含复杂障碍物的三维空间中协作处理计算任务。为了最小化所有用户最大任务完成时延与系统总能耗的加权和,联合优化用户的离散卸载决策与UAV的连续三维轨迹。针对这一混合(离散-连续)优化问题,提出了基于异构多智能体深度强化学习的UOUM(User Offloading and UAV Mobility Co-optimization)算法。该算法通过构建异构多智能体框架,设计用户卸载决策为离散动作空间,UAV轨迹优化为连续动作空间,解决了混合动作空间的优化挑战;引入差分奖励机制,精确量化各智能体策略的边际贡献,并分配奖励,解决多智能体的信用分配问题;同时,融入人工势能约束,将障碍物避让要求转化为可微分的安全势能函数,确保UAV避障并提升训练效率。仿真实验结果表明,在不同测试场景下,UOUM的性能在时延、能耗和系统成本方面均优于三种对比算法(仅卸载优化算法、仅轨迹优化算法和异构多智能体强化学习算法),验证了其有效性与可靠性。UOUM算法在时延优化、能耗控制和避障安全性方面具有显著提升,展现了较强的环境适应性。

关键词: 移动边缘计算, 多智能体强化学习, 无人机避障, 任务卸载, 三维轨迹优化

Abstract: Abstract: The rapid growth of Internet of Things (IoT) and mobile terminal devices has led to significant challenges in network latency and energy consumption due to the massive computational tasks. To address this, a multi-unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) system was studied, where UAVs provide efficient computational offloading services for ground users. In the system, multiple UAVs collaborate to process computational tasks in a three-dimensional space with complex obstacles. To minimize the weighted sum of the maximum task completion latency for all users and the total system energy consumption, a joint optimization of users' discrete offloading decisions and UAVs' continuous three-dimensional trajectories is performed. To solve this mixed (discrete-continuous) optimization problem, a heterogeneous multi-agent deep reinforcement learning-based UOUM (User Offloading and UAV Mobility Co-optimization) algorithm was proposed. The algorithm constructs a heterogeneous multi-agent framework, where user offloading decisions are designed for discrete action space, and UAV trajectory optimization is designed for continuous action space, addressing the optimization challenge of mixed action spaces. A differential reward mechanism is introduced to precisely quantify the marginal contributions of each agent's strategy and allocate rewards, solving the multi-agent credit assignment problem. Additionally, artificial potential field constraints are integrated to transform obstacle avoidance requirements into differentiable safety potential functions, ensuring UAVs avoid obstacles and improving training efficiency. Simulation results show that, in various test scenarios, UOUM outperforms three benchmark algorithms (user offloading optimization only, UAV trajectory optimization only, and heterogeneous multi-agent reinforcement learning) in terms of latency, energy consumption, and system cost, validating its effectiveness and reliability. The UOUM algorithm achieves significant improvements in latency optimization, energy control, and obstacle avoidance safety, demonstrating strong environmental adaptability.

Key words: Mobile edge computing, multi-agent reinforcement learning, UAV obstacle avoidance, task offloading, three-dimensional trajectory optimization

中图分类号: