基于改进深度强化学习的边缘计算服务卸载算法

doi:10.11772/j.issn.1001-9081.2022050724

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (5): 1543-1550.DOI: 10.11772/j.issn.1001-9081.2022050724

所属专题：先进计算

基于改进深度强化学习的边缘计算服务卸载算法

曹腾飞(), 刘延亮, 王晓英

青海大学计算机技术与应用系，西宁 810016

收稿日期:2022-05-19 修回日期:2022-06-25 接受日期:2022-06-27 发布日期:2022-06-30 出版日期:2023-05-10
通讯作者: 曹腾飞
作者简介:曹腾飞（1987—），男，湖北钟祥人，副教授，博士，CCF高级会员，主要研究方向：B5G网络中的边缘计算 caotf@qhu.edu.cn
刘延亮（2002—），男，湖南衡阳人，硕士研究生，湖南衡阳人，主要研究方向：边缘计算、强化学习
王晓英（1982—），女，吉林大安人，教授，博士生导师，博士，主要研究方向：计算机网络体系结构、移动计算。
基金资助:
国家自然科学基金资助项目(62101299);青海省自然科学基金资助项目(2020?ZJ?943Q)

Edge computing and service offloading algorithm based on improved deep reinforcement learning

Tengfei CAO(), Yanliang LIU, Xiaoying WANG

Department of Computer Technology and Applications，Qinghai University，Xining Qinghai 810016，China

Received:2022-05-19 Revised:2022-06-25 Accepted:2022-06-27 Online:2022-06-30 Published:2023-05-10
Contact: Tengfei CAO
About author:CAO Tengfei， born in 1987， Ph. D.， associate professor. His research interests include edge computing in B5G network.
LIU Yanliang， born in 2002， M. S. candidate. His research interests include edge computing， reinforcement learning.
WANG Xiaoying， born in 1982， Ph. D.， professor. Her research interests include computer network architecture， mobile computing.
Supported by:
National Natural Science Foundation of China(62101299);Natural Science Foundation of Qinghai Province(2020-ZJ-943Q)

摘要/Abstract

摘要：

在边缘计算（EC）网络中，针对边缘节点计算资源和存储空间有限的问题，提出一种基于改进深度强化学习（DRL）的边缘计算服务卸载（ECSO）算法，以降低节点处理时延和提高服务性能。具体来说，将边缘节点服务卸载问题转化为资源受限的马尔可夫决策过程（MDP），利用DRL算法解决边缘节点的请求状态转移概率难以精确预测的问题；考虑到边缘节点执行缓存服务的状态动作空间过大，定义新的动作行为替代原有动作，并依据提出的动作筛选算法得到最优动作集合，以改进计算动作行为奖励值的过程，进而大幅度降低动作空间大小，提高算法训练的效率以及收益。仿真实验结果表明，对比原深度Q网络（DQN）算法、邻近策略优化（PPO）算法以及传统的最流行（MP）算法，ECSO算法的总奖励值分别提升了7.0%、12.7%和65.6%，边缘节点服务卸载时延分别降低了13.0%、18.8%和66.4%，验证了算法的有效性，说明ECSO能有效提升边缘计算服务的卸载性能。

关键词: 边缘计算, 缓存服务, 服务卸载, 深度强化学习, 动作行为奖励

Abstract:

To solve the problem of limited computing resources and storage space of edge nodes in the Edge Computing （EC） network， an Edge Computing and Service Offloading （ECSO） algorithm based on improved Deep Reinforcement Learning （DRL） was proposed to reduce node processing latency and improve service performance. Specifically， the problem of edge node service offloading was formulated as a resource-constrained Markov Decision Process （MDP）. Due to the difficulty of predicting the request state transfer probability of the edge node accurately， DRL algorithm was used to solve the problem. Considering that the state action space of edge node for caching services is too large， by defining new action behaviors to replace the original actions， the optimal action set was obtained according to the proposed action selection algorithm， so that the process of calculating the action behavior reward was improved， thereby reducing the size of the action space greatly， and improving the training efficiency and reward of the algorithm. Simulation results show that compared with the original Deep Q-Network （DQN） algorithm， Proximal Policy Optimization （PPO） algorithm and traditional Most Popular （MP） algorithm， the total reward value of the proposed ECSO algorithm is increased by 7.0%， 12.7% and 65.6%， respectively， and the latency of edge node service offloading is reduced by 13.0%， 18.8% and 66.4%， respectively， which verifies the effectiveness of the proposed ECSO algorithm and shows that the ECSO can effectively improve the offloading performance of edge computing services.

Key words: Edge Computing (EC), caching service, service offloading, Deep Reinforcement Learning (DRL), action behavior reward

中图分类号:

TP393

曹腾飞, 刘延亮, 王晓英. 基于改进深度强化学习的边缘计算服务卸载算法[J]. 计算机应用, 2023, 43(5): 1543-1550.

Tengfei CAO, Yanliang LIU, Xiaoying WANG. Edge computing and service offloading algorithm based on improved deep reinforcement learning[J]. Journal of Computer Applications, 2023, 43(5): 1543-1550.

图/表 9

图1 边缘计算服务卸载模型

Fig. 1 Edge computing service offloading model

图2 ECSO算法流程

Fig. 2 Flow of ECSO algorithm

图3 云端响应和边缘计算的性能对比

Fig. 3 Performance comparison between cloud computing and edge conputing

图4 训练回合奖励对比

Fig. 4 Comparison of rewards in different training epochs

图5 累计减少的传输时延对比

Fig. 5 Comparison of transmission latency reduction

表1 四种算法训练至稳定后最终的性能对比

Tab. 1 Final performance comparison of four algorithms after training to stability

算法	奖励值	降低的传输时延/s
ECSO	144.03	158.42
DQN-Based	134.57	140.25
PPO-Based	127.85	133.32
MP	86.97	95.20

图6 不同储存空间和计算资源下降低的时延对比

Fig. 6 Latency reduction comparison under different storage space and computing resource

图7 不同用户数量下降低的时延

Fig. 7 Latency reduction under different number of users

图8 不同λ参数下降低的时延

Fig. 8 Latency reduction under different λ parameters

参考文献 25

1	AL-FUQAHA A， GUIZANI M， MOHAMMADI M， et al. Internet of Things： a survey on enabling technologies， protocols， and applications［J］. IEEE Communications Surveys and Tutorials， 2015， 17（4）： 2347-2376. 10.1109/comst.2015.2444095
2	中国互联网络信息中心. 第48次中国互联网络发展状况统计报告［R］. 北京：中国互联网络信息中心， 2021. 10.1007/978-981-33-6930-6_2
	China Internet Network Information Center. The 48th statistical report on China’s Internet development［R］. Beijing： CNNIC， 2021. 10.1007/978-981-33-6930-6_2
3	MAO Y Y， YOU C S， ZHANG J， et al. A survey on mobile edge computing： the communication perspective［J］. IEEE Communications Surveys and Tutorials， 2017， 19（4）： 2322-2358. 10.1109/comst.2017.2745201
4	CHEN X， LI W Z， LU S L， et al. Efficient resource allocation for on demand mobile-edge cloud computing［J］. IEEE Transactions on Vehicular Technology， 2018， 67（9）： 8769-8780. 10.1109/tvt.2018.2846232
5	GUO S T， LIU J D， YANG Y Y， et al. Energy-efficient dynamic computation offloading and cooperative task scheduling in mobile cloud computing［J］. IEEE Transactions on Mobile Computing， 2019， 18（2）： 319-333. 10.1109/tmc.2018.2831230
6	ALFAKIH T， HASSAN M M， GUMAEI A， et al. Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA［J］. IEEE Access， 2020， 8： 54074-54084. 10.1109/access.2020.2981434
7	SADIKI A， BENTAHAR J， DSSOULI R， et al. Deep reinforcement learning for the computation offloading in MIMO-based edge computing［J］. Ad Hoc Networks， 2023， 141： No.103080. 10.1016/j.adhoc.2022.103080
8	LI M S， GAO J， ZHAO L， et al. Deep reinforcement learning for collaborative edge computing in vehicular networks［J］. IEEE Transactions on Cognitive Communications and Networking， 2020， 6（4）： 1122-1135. 10.1109/tccn.2020.3003036
9	LI D J， XU S Y， LI P Y. Deep reinforcement learning-empowered resource allocation for mobile edge computing in cellular V2X networks［J］. Sensors， 2021， 21（2）： No.372. 10.3390/s21020372
10	JIANG Z B， XU C Q， GUAN J F， et al. Stochastic analysis of DASH-based video service in high-speed railway networks［J］. IEEE Transactions on Multimedia， 2019， 21（6）： 1577-1592. 10.1109/tmm.2018.2881095
11	MAO H Z， NETRAVALI R， ALIZADEH M. Neural adaptive video streaming with Pensieve［C］// Proceedings of the 2017 Conference of the ACM Special Interest Group on Data Communication. New York： ACM， 2017： 197-210. 10.1145/3098822.3098843
12	杨思明，单征，丁煜，等. 深度强化学习研究综述［J］. 计算机工程， 2021， 47（12）： 19-29.
	YANG S M， SHAN Z， DING Y， et al. Survey of research on deep reinforcement learning［J］. Computer Engineering， 2021， 47（12）： 19-29.
13	van HASSELT H， GUEZ A， SILVER D. Deep reinforcement learning with double Q-learning［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 2094-2100. 10.1609/aaai.v30i1.10295
14	XIONG X， ZHENG K， LEI L， et al. Resource allocation based on deep reinforcement learning in IoT edge computing［J］. IEEE Journal on Selected Areas in Communications， 2020， 38（6）： 1133-1146. 10.1109/jsac.2020.2986615
15	梁俊斌，张海涵，蒋婵，等. 移动边缘计算中基于深度强化学习的任务卸载研究进展［J］. 计算机科学， 2021， 48（7）： 316-323. 10.11896/jsjkx.200800095
	LIANG J B， ZHANG H H， JIANG C， et al. Research progress of task offloading based on deep reinforcement learning in mobile edge computing［J］. Computer Science， 2021， 48（7）： 316-323. 10.11896/jsjkx.200800095
16	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Playing Atari with deep reinforcement learning［EB/OL］. ［2021-12-19］.. 10.1038/nature14236
17	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［EB/OL］. ［2022-01-13］..
18	QIAN Y C， WANG R， WU J， et al. Reinforcement learning based optimal computing and caching in mobile edge network［J］. IEEE Journal on Selected Areas in Communications， 2020， 38（10）：2343-2355. 10.1109/jsac.2020.3000396
19	WANG C， GUAN J F， FENG T T， et al. BitLat： bitrate-adaptivity and latency-awareness algorithm for live video streaming［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 2642-2646. 10.1145/3343031.3356069
20	HOCHBA D S. Approximation algorithms for NP-hard problems［J］. ACM SIGACT News， 1997， 28（2）：40-52. 10.1145/261342.571216
21	QIU X Y， LIU L B， CHEN W H， et al. Online deep reinforcement learning for computation offloading in blockchain-empowered mobile edge computing［J］. IEEE Transactions on Vehicular Technology， 2019， 68（8）： 8050-8062. 10.1109/tvt.2019.2924015
22	CAO T F， XU C Q， DU J P， et al. Reliable and efficient multimedia service optimization for edge computing-based 5G networks： game theoretic approaches［J］. IEEE Transactions on Network and Service Management， 2020， 17（3）： 1610-1625. 10.1109/tnsm.2020.2993886
23	XU J， CHEN L X， ZHOU P. Joint service caching and task offloading for mobile edge computing in dense networks［C］// Proceedings of the 2018 IEEE Conference on Computer Communications. Piscataway： IEEE， 2018：207-215. 10.1109/infocom.2018.8485977
24	ABADI A， AGARWAL A， BARHAM P， et al. TensorFlow： large-scale machine learning on heterogeneous distributed systems［EB/OL］. ［2021-12-20］..
25	CAO T F， XU C Q， WANG M， et al. Stochastic optimization for green multimedia services in dense 5G networks［J］. ACM Transactions on Multimedia Computing， Communications， and Applications， 2019， 15（3）： No.79. 10.1145/3328996

[1]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[2]	马天, 席润韬, 吕佳豪, 曾奕杰, 杨嘉怡, 张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2055-2064.
[3]	张俊娜, 王欣新, 李天泽, 赵晓焱, 袁培燕. 基于动态服务缓存辅助的任务卸载方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1493-1500.
[4]	赵晓焱, 韩威, 张俊娜, 袁培燕. 基于异步深度强化学习的车联网协作卸载策略[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1501-1510.
[5]	唐睿, 庞川林, 张睿智, 刘川, 岳士博. D2D通信增强的蜂窝网络中基于DDPG的资源分配[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1562-1569.
[6]	秦鑫彤, 宋政育, 侯天为, 王飞越, 孙昕, 黎伟. 基于自适应p持续的移动自组网信道接入和资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 863-868.
[7]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[8]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 445-451.
[9]	张明, 付乐, 王海峰. 面向边缘计算的并发数据流接转控制模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3876-3883.
[10]	余家宸, 杨晔. 基于裁剪近端策略优化算法的软机械臂不规则物体抓取[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3629-3638.
[11]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[12]	赵徐炎, 崔允贺, 蒋朝惠, 钱清, 申国伟, 郭春, 李显超. CHAIN：基于重合支配的边缘计算节点放置算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2812-2818.
[13]	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2636-2643.
[14]	王子腾, 于亚新, 夏子芳, 乔佳琪. 融合好奇心和策略蒸馏的稀疏奖励探索机制[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2082-2090.
[15]	陈宛桢, 张恩, 秦磊勇, 洪双喜. 边缘计算下基于区块链的隐私保护联邦学习算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2209-2216.

基于改进深度强化学习的边缘计算服务卸载算法

Edge computing and service offloading algorithm based on improved deep reinforcement learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 25

相关文章 15

编辑推荐

Metrics