基于深度Q网络的多目标任务卸载算法

doi:10.11772/j.issn.1001-9081.2021061367

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (6): 1668-1674.DOI: 10.11772/j.issn.1001-9081.2021061367

• 2021年全国开放式分布与并行计算学术年会(DPCS 2021)论文 • 上一篇下一篇

基于深度Q网络的多目标任务卸载算法

邓世权¹, 叶绪国²()

^1.凯里学院大数据工程学院，贵州凯里 556011
^2.凯里学院理学院，贵州凯里 556011

收稿日期:2021-08-02 修回日期:2021-08-15 接受日期:2021-09-28 发布日期:2022-01-10 出版日期:2022-06-10
通讯作者: 叶绪国
作者简介:邓世权（1981—），男，贵州江口人，副教授，硕士，CCF会员，主要研究方向：智能信息处理、边缘计算、计算智能
基金资助:
国家自然科学基金资助项目(11961038);贵州省教育厅科技项目(［2017］333)

Multi-objective task offloading algorithm based on deep Q-network

Shiquan DENG¹, Xuguo YE²()

^1.School of Big Data Engineering，Kaili University，Kaili Guizhou 556011，China
^2.School of Sciences，Kaili University，Kaili Guizhou 556011，China

Received:2021-08-02 Revised:2021-08-15 Accepted:2021-09-28 Online:2022-01-10 Published:2022-06-10
Contact: Xuguo YE
About author:DENG Shiquan，born in 1981，M. S.，associate professor. His research interests include intelligent information processing， edgecomputing，computational intelligence.
Supported by:
National Natural Science Foundation of China(11961038);Science and Technology Project of Education Department of Guizhou Province(［2017］333)

摘要/Abstract

摘要：

在移动边缘计算（MEC）中，计算资源和电池容量有限的移动设备（MD）可卸载自身计算密集型应用到边缘服务器上执行，这样不仅可以提高MD计算能力，也能降低能耗。然而，不合理的任务卸载决策不但会延长应用完成时间，而且会大量增加能耗，进而降低用户体验。鉴于此，首先分析MD的移动性和任务间的顺序依赖关系，建立动态MEC网络下的以应用完成时间和能源消耗最小为优化目标的多目标任务卸载问题模型；然后，设计求解该问题的马尔可夫决策过程（MDP）模型，包括状态空间、动作空间和奖励函数，并提出基于深度Q网络（DQN）的多目标任务卸载算法（MTOA-DQN），该算法采用一条轨迹作为经验池的最小单元来改进原始的DQN算法。在多种测试场景下，MTOA-DQN的性能在累积奖励和Cost方面均优于三种对比算法（基于分解的多目标进化算法（MOEA/D）、自适应的DAG任务调度算法（ADTS）和原始的DQN算法），验证了该算法的有效性和可靠性。

关键词: 移动边缘计算, 任务卸载, 完成时间, 能源消耗, 强化学习

Abstract:

For the Mobile Device （MD） with limited computing resources and battery capacity in Mobile Edge Computing （MEC）， its computing capacity can be enhanced and its energy consumption can be reduced through offloading its own computing-intensive applications to the edge server. However， unreasonable task offloading strategy will bring a bad experience for users since it will increase the application completion time and energy consumption. To overcome above challenge， firstly， a multi-objective task offloading problem model with minimizing the application completion time and energy consumption as optimization targets was built in the dynamic MEC network via analyzing the mobility of the mobile device and the sequential dependencies between tasks. Then， a Markov Decision Process （MDP） model， including state space， action space， and reward function， was designed to solve this problem， and a Multi-Objective Task Offloading Algorithm based on Deep Q-Network （MTOA-DQN） was proposed， which uses a trajectory as the smallest unit of the experience buffer to improve the original DQN. The proposed MTOA-DQN outperforms three comparison algorithms including MultiObjective Evolutionary Algorithm based on Decomposition （MOEA/D）， Adaptive DAG （Directed Acyclic Graph） Tasks Scheduling （ADTS） and original DQN in terms of cumulative reward and cost in a number of test scenarios， verifying the effectiveness and reliability of the algorithm.

Key words: Mobile Edge Computing (MEC), task offloading, completion time, energy consumption, Reinforcement Learning (RL)

中图分类号:

TP391.9

邓世权, 叶绪国. 基于深度Q网络的多目标任务卸载算法[J]. 计算机应用, 2022, 42(6): 1668-1674.

Shiquan DENG, Xuguo YE. Multi-objective task offloading algorithm based on deep Q-network[J]. Journal of Computer Applications, 2022, 42(6): 1668-1674.

图/表 7

图1 MEC系统示意图

Fig. 1 Schematic diagram of MEC system

图2 一个应用的DAG

Fig. 2 DAG of an application

表1 主要符号汇总

Tab. 1 Summary of main symbols

符号	定义
$C o s t (G)$	MD完成应用G的总代价
$c i$	执行任务v_i 需要的CPU循环次数
$d i$	任务v_i 的输入数据规模
E	应用G中任务间的顺序依赖关系集
$E C (G)$	MD执行应用G所消耗的能量
$e (v i, v j)$	表示任务v_i 和v_j 间的顺序依赖关系
$e x i t (G)$	应用G的结束任务集
$f l$	MD计算能力
$f m s$	第m个边缘服务器的计算能力
$F T (G)$	应用G的完成时间
$G$	MD待执行的应用
$M$	边缘服务器数量
$M$	边缘服务器集
$m$	第m个边缘服务器
$o i$	任务v_i 的输出数据规模
$p r e (v i)$	任务v_i 的直接前驱任务集
$p w t$	MD的传输功率
$p w r$	MD的接收功率
$R m$	MD与边缘服务器 $E m$ 间的可达上行传输速率
$s u c (v i)$	任务v_i 的直接后继任务集
$V$	应用G的任务集合
$v i$	应用G的第i个任务
$x$	应用G中所有任务的卸载决策
$x i$	任务v_i 的卸载决策
$α$	应用完成时间的权重
$β$	MD能耗的权重

表1 主要符号汇总

Tab. 1 Summary of main symbols

符号	定义
$C o s t (G)$	MD完成应用G的总代价
$c i$	执行任务v_i 需要的CPU循环次数
$d i$	任务v_i 的输入数据规模
E	应用G中任务间的顺序依赖关系集
$E C (G)$	MD执行应用G所消耗的能量
$e (v i, v j)$	表示任务v_i 和v_j 间的顺序依赖关系
$e x i t (G)$	应用G的结束任务集
$f l$	MD计算能力
$f m s$	第m个边缘服务器的计算能力
$F T (G)$	应用G的完成时间
$G$	MD待执行的应用
$M$	边缘服务器数量
$M$	边缘服务器集
$m$	第m个边缘服务器
$o i$	任务v_i 的输出数据规模
$p r e (v i)$	任务v_i 的直接前驱任务集
$p w t$	MD的传输功率
$p w r$	MD的接收功率
$R m$	MD与边缘服务器 $E m$ 间的可达上行传输速率
$s u c (v i)$	任务v_i 的直接后继任务集
$V$	应用G的任务集合
$v i$	应用G的第i个任务
$x$	应用G中所有任务的卸载决策
$x i$	任务v_i 的卸载决策
$α$	应用完成时间的权重
$β$	MD能耗的权重

图3 不同参数下的累积奖励

Fig. 3 Cumulative rewards under different parameters

图4 不同任务规模N下两种DQN的累积奖励

Fig. 4 Cumulative rewards of two DQNs under different task sizes N

图5 不同任务规模N下四种算法的箱线图

Fig. 5 Box plots of four algorithms under different task sizes N

表2 不同任务规模N的Cost平均值比较

Tab. 2 Comparison of average Cost for different task sizes N

算法	$N = 10$	$N = 20$	$N = 30$
MOEA/D	4.666 5	9.487 1	14.253 7
ADTS	4.632 3	9.419 7	14.217 5
DQN	4.493 1	9.164 7	13.673 4
MOTA-DQN	4.487 8	9.106 6	13.606 1

表2 不同任务规模N的Cost平均值比较

Tab. 2 Comparison of average Cost for different task sizes N

算法	$N = 10$	$N = 20$	$N = 30$
MOEA/D	4.666 5	9.487 1	14.253 7
ADTS	4.632 3	9.419 7	14.217 5
DQN	4.493 1	9.164 7	13.673 4
MOTA-DQN	4.487 8	9.106 6	13.606 1

参考文献 21

1	LI L L， LIU Z F， TSENG M L， et al. Enhancing the Lithium-ion battery life predictability using a hybrid method［J］. Applied Soft Computing， 2019， 74： 110-121. 10.1016/j.asoc.2018.10.014
2	ATAT R， LIU L J， CHEN H， et al. Enabling cyber-physical communication in 5G cellular networks： challenges， spatial spectrum sensing， and cyber-security［J］. IET Cyber-Physical Systems： Theory and Applications， 2017， 2（1）： 49-54. 10.1049/iet-cps.2017.0010
3	LI C L， ZHU L Y， TANG H L， et al. Mobile user behavior based topology formation and optimization in ad hoc mobile cloud［J］. Journal of Systems and Software， 2019， 148： 132-147. 10.1016/j.jss.2018.11.005
4	NOVAK E， TANG Z F， LI Q. Ultrasound proximity networking on smart mobile devices for IoT applications［J］. IEEE Internet of Things Journal， 2019， 6（1）： 399-409. 10.1109/jiot.2018.2848099
5	MAO Y Y， YOU C S， ZHANG J， et al. A survey on mobile edge computing： the communication perspective［J］. IEEE Communications Surveys and Tutorials， 2017， 19（4）： 2322-2358. 10.1109/comst.2017.2745201
6	WANG S， ZHANG X， ZHANG Y， et al. A survey on mobile edge networks： convergence of computing， caching and communications［J］. IEEE Access， 2017， 5： 6757-6779. 10.1109/access.2017.2685434
7	ABBAS N， ZHANG Y， TAHERKORDI A， et al. Mobile edge computing： a survey［J］. IEEE Internet of Things Journal， 2018， 5（1）： 450-465. 10.1109/jiot.2017.2750180
8	KENESHLOO Y， SHI T， RAMAKRISHNAN N， et al. Deep reinforcement learning for sequence-to-sequence models［J］. IEEE Transactions on Neural Networks and Learning Systems， 2020， 31（7）： 2469-2489.
9	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518（7540）： 529-533. 10.1038/nature14236
10	LUONG N C， HOANG D T， GONG S M， et al. Applications of deep reinforcement learning in communications and networking： a survey［J］. IEEE Communications Surveys and Tutorials， 2019， 21（4）： 3133-3174. 10.1109/comst.2019.2916583
11	KIRAN B R， SOBH I， TALPAERT V， et al. Deep reinforcement learning for autonomous driving： a survey［J/OL］. IEEE Transactions on Intelligent Transportation Systems. （2021-01-23）［2022-06-20］. . 10.1109/tits.2021.3054625
12	WAN Z Q， JIANG C， FAHAD M， et al. Robot-assisted pedestrian regulation based on deep reinforcement learning［J］. IEEE Transactions on Cybernetics， 2020， 50（4）： 1669-1682. 10.1109/tcyb.2018.2878977
13	LIN X， WANG Y Z， XIE Q， et al. Task scheduling with dynamic voltage and frequency scaling for energy minimization in the mobile cloud computing environment［J］. IEEE Transactions on Services Computing， 2015， 8（2）： 175-186. 10.1109/tsc.2014.2381227
14	MAHMOODI S E， UMA R N， SUBBALAKSHMI K P. Optimal joint scheduling and cloud offloading for mobile applications［J］. IEEE Transactions on Cloud Computing， 2019， 7（2）： 301-313. 10.1109/tcc.2016.2560808
15	周业茂，李忠金，葛季栋，等. 移动云计算中基于延时传输的多目标工作流调度［J］. 软件学报， 2018， 29（11）： 3306-3325. 10.13328/j.cnki.jos.005479
	ZHOU Y M， LI Z J， GE J D， et al. Multi-objective workflow scheduling based on delay transmission in mobile cloud computing［J］. Journal of Software， 2018， 29（11）： 3306-3325. 10.13328/j.cnki.jos.005479
16	SONG F H， XING H L， LUO S X， et al. A multiobjective computation offloading algorithm for mobile-edge computing［J］. IEEE Internet of Things Journal， 2020， 7（9）： 8780-8799. 10.1109/jiot.2020.2996762
17	杨天，杨军. 移动边缘计算中的卸载决策与资源分配策略［J］. 计算机工程， 2021， 47（2）： 19-25. 10.19678/j.issn.1000-3428.0058085
	YANG T， YANG J. Offloading decision and resource allocation strategy in mobile edge computing［J］. Computer Engineering， 2021， 47（2）： 19-25. 10.19678/j.issn.1000-3428.0058085
18	YANG L， ZHONG C Y， YANG Q H， et al. Task offloading for directed acyclic graph applications based on edge computing in Industrial Internet［J］. Information Sciences， 2020， 540： 51-68. 10.1016/j.ins.2020.06.001
19	WU Q， WU Z W， ZHUANG Y H， et al. Adaptive DAG tasks scheduling with deep reinforcement learning［C］// Proceedings of the 2018 International Conference on Algorithms and Architectures for Parallel Processing， LNTCS 11335. Cham： Springer， 2018： 477-490.
20	詹文翰，王瑾，朱清新，等. 移动边缘计算中基于深度强化学习的计算卸载调度方法［J］. 计算机应用研究， 2021， 38（1）： 241-245， 263. 10.19734/j.issn.1001-3695.2019.10.0594
	ZHAN W H， WANG J， ZHU Q X， et al. Deep reinforcement learning based offloading scheduling in mobile edge computing［J］. Application Research of Computers， 2021， 38（1）： 241-245， 263. 10.19734/j.issn.1001-3695.2019.10.0594
21	YAN J， BI S Z， ZHANG Y J A. Offloading and resource allocation with general task graph in mobile edge computing： a deep reinforcement learning approach［J］. IEEE Transactions on Wireless Communications， 2020， 19（8）： 5404-5419. 10.1109/twc.2020.2993071

[1]	谭庆, 李辉, 吴昊霖, 王壮, 邓书超. 基于奖励预测误差的内在好奇心方法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1822-1828.
[2]	赵海妮, 焦健. 基于强化学习的渗透路径推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1689-1694.
[3]	袁景凌, 毛慧华, 王娜娜, 向尧. 移动边缘计算中资源受限的动态服务部署策略[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1662-1667.
[4]	邓绍斌, 朱军, 周晓锋, 李帅, 刘舒锐. 基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1642-1648.
[5]	李余, 何希平, 唐亮贵. 基于终端直通通信的多用户计算卸载资源优化决策[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1538-1546.
[6]	陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200.
[7]	曾续玲, 李陶深, 巩健, 杜利俊. 无线供能移动边缘计算系统的安全卸载优化[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1216-1224.
[8]	李学明, 吴国豪, 周尚波, 林晓然, 谢洪斌. 基于分数阶网络和强化学习的图像实例分割模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 574-583.
[9]	曾柏森, 钟勇, 牛宪华. 基于因子分解机用于安全探索的Q表初始化方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 209-214.
[10]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[11]	王宇, 刘燕丽, 陈劭武. 基于顶点冲突学习的最大公共子图算法[J]. 计算机应用, 2021, 41(6): 1756-1760.
[12]	王建平, 王刚, 毛晓彬, 马恩琪. 基于深度强化学习的二连杆机械臂运动控制方法[J]. 计算机应用, 2021, 41(6): 1799-1804.
[13]	董文涛, 李卓, 陈昕. 基于联邦学习的在线短视频内容分发策略[J]. 计算机应用, 2021, 41(6): 1551-1556.
[14]	王艺洁, 凡佳飞, 王陈宇. 云边环境下基于博弈论的两阶段任务迁移策略[J]. 计算机应用, 2021, 41(5): 1392-1398.
[15]	毛莺池, 徐雪松, 刘鹏飞. 基于稳定匹配的多用户任务卸载策略[J]. 计算机应用, 2021, 41(3): 786-793.

基于深度Q网络的多目标任务卸载算法

Multi-objective task offloading algorithm based on deep Q-network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 21

相关文章 15

编辑推荐

Metrics