Joint optimization method for SWIPT edge network based on deep reinforcement learning

doi:10.11772/j.issn.1001-9081.2022111732

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3540-3550.DOI: 10.11772/j.issn.1001-9081.2022111732

Special Issue: 网络与通信

• Network and communications • Previous Articles Next Articles

Joint optimization method for SWIPT edge network based on deep reinforcement learning

Zhe WANG¹^,²^,³, Qiming WANG²(), Taoshen LI⁴, Lina GE¹^,³^,⁵

^1.School of Artificial Intelligence，Guangxi Minzu University，Nanning Guangxi 530006，China
^2.College of Electronic Information，Guangxi Minzu University，Nanning Guangxi 530006，China
^3.Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis （Guangxi Minzu University），Nanning Guangxi 530006，China
^4.School of Computer，Electronics and Information，Guangxi University，Nanning Guangxi 530004，China
^5.Key Laboratory of Network Communication Engineering，Guangxi Minzu University，Nanning Guangxi 530006，China

Received:2022-11-22 Revised:2023-04-30 Accepted:2023-05-12 Online:2023-06-02 Published:2023-11-10
Contact: Qiming WANG
About author:WANG Zhe， born in 1991， Ph. D.， associate professor. His research interests include computer network， simultaneous information and power transfer， federated machine learning.
WANG Qiming， born in 1997， M. S. candidate. His research interests include computer network， simultaneous information and power transfer， machine learning.
LI Taoshen， born in 1957， Ph. D.， professor. His research interests include mobile wireless network， wireless energy transmission， internet of things， smart city.
GE Lina， born in 1969， Ph. D.， professor. Her research interests include network and information security， mobile computing， artificial intelligence.
Supported by:
National Natural Science Foundation of China(61862007);Natural Science Foundation of Guangxi Province(2020GXNSFBA297103);Scientific Research Start Project of Talents Introduced by Guangxi Minzu University(2019KJQD17)

基于深度强化学习的SWIPT边缘网络联合优化方法

王哲¹^,²^,³, 王启名²(), 李陶深⁴, 葛丽娜¹^,³^,⁵

^1.广西民族大学人工智能学院，南宁 530006
^2.广西民族大学电子信息学院，南宁 530006
^3.广西混杂计算与集成电路设计分析重点实验室（广西民族大学），南宁 530006
^4.广西大学计算机与电子信息学院，南宁 530004
^5.广西民族大学网络通信工程重点实验室，南宁 530006

通讯作者: 王启名
作者简介:王哲（1991—），男，河南南阳人，副教授，博士，CCF会员，主要研究方向：计算机网络、携能通信、联邦机器学习
王启名（1997—），男，江苏宿迁人，硕士研究生，主要研究方向：计算机网络、携能通信、机器学习 wqm082199@163. com
李陶深（1957—），男，广西南宁人，教授，博士，CCF杰出会员，主要研究方向：移动无线网络、无线能量传输、物联网、智慧城市
葛丽娜（1969—），女，广西环江人，教授，博士，CCF高级会员，主要研究方向：网络与信息安全、移动计算、人工智能。
基金资助:
国家自然科学基金资助项目(61862007);广西自然科学基金资助项目(2020GXNSFBA297103);广西民族大学引进人才科研启动项目(2019KJQD17)

Abstract

Abstract:

Edge Computing （EC） and Simultaneous Wireless Information and Power Transfer （SWIPT） technologies can improve the performance of traditional networks， but they also increase the difficulty and complexity of system decision-making. The system decisions designed by optimization methods often have high computational complexity and are difficult to meet the real-time requirements of the system. Therefore， aiming at Wireless Sensor Network （WSN） assisted by EC and SWIPT， a mathematical model of system energy efficiency optimization was proposed by jointly considering beamforming， computing offloading and power control problems in the network. Then， concerning the non-convex and parameter coupling characteristics of this model， a joint optimization method based on deep reinforcement learning was proposed by designing information interchange process of the system. This method did not need to build an environmental model and adopted a reward function instead of the Critic network for action evaluation， which could reduce the difficulty of decision-making and improve the system real-time performance. Finally， based on the joint optimization method， an Improved Deep Deterministic Policy Gradient （IDDPG） algorithm was designed. Simulation comparisons were made with a variety of optimization algorithms and machine learning algorithms to verify the advantages of the joint optimization method in reducing the computational complexity and improving real-time performance of decision-making.

Key words: Wireless Sensor Network (WSN), deep reinforcement learning, SWIPT (Simultaneous Wireless Information and Power Transfer), Edge Computing (EC), joint optimization

摘要：

边缘计算（EC）与无线携能通信（SWIPT）技术能够提升传统网络性能，但同时也增加了系统决策制定的难度和复杂度。而基于最优化方法所设计的系统决策往往具有较高的计算复杂度，无法满足系统的实时性需求。为此，针对EC与SWIPT辅助的无线传感网络（WSN），联合考虑网络中波束成形、计算卸载与功率控制问题，建立了系统能效最优化数学模型；其次，针对该模型的非凸与参数耦合特征，通过设计系统的信息交换过程，提出基于深度强化学习的联合优化方法，该方法无须建立环境模型，采用奖励函数代替Critic网络对动作进行评估，能降低决策制定难度并提升实时性；最后，基于该方法设计了改进的深度确定性策略梯度（IDDPG）算法，并与多种最优化算法和机器学习算法进行仿真对比，验证了联合优化方法在降低计算复杂度、提升决策实时性方面的优势。

关键词: 无线传感网络, 深度强化学习, 无线携能通信, 边缘计算, 联合优化

Zhe WANG, Qiming WANG, Taoshen LI, Lina GE. Joint optimization method for SWIPT edge network based on deep reinforcement learning[J]. Journal of Computer Applications, 2023, 43(11): 3540-3550.

王哲, 王启名, 李陶深, 葛丽娜. 基于深度强化学习的SWIPT边缘网络联合优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3540-3550.

Figures/Tables 14

Fig. 1 Wireless sensor network edge computing system based on SWIPT

Fig. 2 Schematic diagram of system cycle

Fig. 3 Schematic diagram of information interchange

Fig. 4 Schematic diagram of multi-agent algorithm

Fig. 5 Network structure

Tab. 1 Simulation parameters

参数	参数值
Sink节点数n	10
传感器节点数k	20
Sink节点最大发射功率/dBm	38
周期长度/ms	0.02
高斯白噪声/dBm	-114
功率分割因子 $ζ$	0.5
能量收集的时间比 $η$	0.5

Tab. 1 Simulation parameters

参数	参数值
Sink节点数n	10
传感器节点数k	20
Sink节点最大发射功率/dBm	38
周期长度/ms	0.02
高斯白噪声/dBm	-114
功率分割因子 $ζ$	0.5
能量收集的时间比 $η$	0.5

Tab. 2 IDDPG hyperparameters

参数	参数值
探索率（e）	0.01
批大小（batch size）	128
折扣因子	0.5
学习率	0.001
软更新频率	0.01
隐藏层单元个数	（200，100，50）

Fig. 6 Convergence of algorithm under different learning rates

Fig. 7 Training results

Fig. 8 Loss value change curve

Fig. 9 Test set result graph

Tab. 3 Comparison of performance test results on the test set

（Sink数量，Sensor数量）	迭代次数		采取决策制定所需时间/ms				准确度%
（Sink数量，Sensor数量）	WMMSE	FP	WMMSE	FP	IDDPG	DQN	FP	IDDPG	DQN	MaxPower
（10，20）	47	24	24.3	18.6	1.05	1.20	94	96	95	35
（20，40）	75	29	64.3	18.5	0.53	0.94	94	94	92	29
（20，60）	83	28	120.0	25.3	0.41	0.64	93	93	93	21
（20，100）	96	32	179.4	28.4	0.59	0.78	91	91	90	14

Fig. 10 Objective function CDF under different motion situations of system sensors

Fig. 11 CDF for different numbers of moving and non-moving nodes （perfectCSI）

References 34

1	刘通，方璐，高洪皓. 边缘计算中任务卸载研究综述［J］. 计算机科学， 2021， 48（1）：11-15. 10.11896/jsjkx.200900217
	LIU T， FANG L， GAO H H. Survey of task offloading in edge computing［J］. Computer Science， 2021， 48（1）： 11-15. 10.11896/jsjkx.200900217
2	陈霄，刘巍，陈静，等. 边缘计算环境下的计算卸载策略研究［J］. 火力与指挥控制， 2022， 47（1）：7-14， 19. 10.3969/j.issn.1002-0640.2022.01.002
	CHEN X， LIU W， CHEN J， et al. Research on computing offload strategy in edge computing environment［J］. Fire Control & Command Control， 2022， 47（1）：7-14， 19. 10.3969/j.issn.1002-0640.2022.01.002
3	LIU H， JIA H， CHEN J， et al. Computing resource allocation of mobile edge computing networks based on potential game theory［EB/OL］. ［2022-11-16］.. 10.1109/compcomm.2018.8780576
4	WANG G， XU F. Regional intelligent resource allocation in mobile edge computing based vehicular network［J］. IEEE Access， 2020， 8： 7173-7182. 10.1109/access.2020.2964018
5	鲜永菊，宋青芸，郭陈榕，等. 计算资源受限MEC中任务卸载与资源分配方法［J］. 小型微型计算机系统， 2022， 43（8）：1782-1787.
	XIAN Y J， SONG Q Y， GUO C R， et al. Method of task offloading and resource allocation in MEC with limited computing resources［J］. Journal of Chinese Computer Systems， 2022， 43（8）：1782-1787.
6	李余，何希平，唐亮贵. 基于终端直通通信的多用户计算卸载资源优化决策［J］. 计算机应用， 2022， 42（5）：1538-1546. 10.11772/j.issn.1001-9081.2021030458
	LI Y， HE X P， TANG L G. Multi-user computation offloading and resource optimization policy based on device-to-device communication［J］. Journal of Computer Applications， 2022， 42（5）：1538-1546. 10.11772/j.issn.1001-9081.2021030458
7	李燕君，蒋华同，高美惠. 基于强化学习的边缘计算网络资源在线分配方法［J］. 控制与决策， 2022， 37（11）： 2880-2886.
	LI Y J， JIANG H T， GAO M H. Reinforcement learning-based online resource allocation for edge computing network［J］. Control and Decision， 2022， 37（11）： 2880-2886.
8	朱思峰，蔡江昊，柴争义，等. 车联网边缘场景下基于免疫算法的计算卸载优化［J/OL］. 吉林大学学报（工学版）（2022-07-26）［2022-11-16］.. 10.11959/j.issn.1000-436x.2022114
	ZHU S F， CAI J H， CHAI Z Y， et al. A novel computing offloading optimization scheme based on immune algorithm in edge computing scenes of internet of vehicles［J/OL］. Journal of Jilin University （Engineering and Technology Edition）（2022-07-26）［2022-11-16］.. 10.11959/j.issn.1000-436x.2022114
9	李斌，刘文帅，谢万城，等. 智能超表面赋能移动边缘计算部分任务卸载策略［J］. 电子与信息学报， 2022， 44（7）：2309-2316. 10.11999/JEIT211595
	LI B， LIU W S， XIE W C， et al. Partial computation offloading for double-RIS assisted multi-user mobile edge computing networks［J］. Journal of Electronics and Information Technology， 2022， 44（7）： 2309-2316. 10.11999/JEIT211595
10	CHEN F， WANG A， ZHANG Y， et al. Energy efficient SWIPT based mobile edge computing framework for WSN-assisted IoT［J］. Sensors， 2021， 21（14）： No.4798. 10.3390/s21144798
11	FU J， HUA J， WEN J， et al. Optimization of achievable rate in the multiuser satellite IoT system with SWIPT and MEC［J］. IEEE Transactions on Industrial Informatics， 2021， 17（3）： 2072-2080. 10.1109/tii.2020.2985157
12	TIONG T， SAAD I， KIN TEO K T， et al. Deep reinforcement learning online offloading for SWIPT multiple access edge computing network［C］// Proceedings of the IEEE 11th International Conference on System Engineering and Technology. Piscataway： IEEE， 2021： 240-245. 10.1109/icset53708.2021.9612551
13	LI N， HAO W， ZHOU F， et al. Smart grid enabled computation offloading and resource allocation for SWIPT-based MEC system［J］. IEEE Transactions on Circuits and Systems Ⅱ： Express Briefs， 2022， 69（8）： 3610-3614. 10.1109/tcsii.2022.3168149
14	WANG X， LI J， NING Z， et al. Wireless powered mobile edge computing networks： a survey［J］. ACM Computing Surveys， 2023， 55（13s）： No.263. 10.1145/3579992
15	MUSTAFA E， SHUJA J， BILAL K， et al. Reinforcement learning for intelligent online computation offloading in wireless powered edge networks［J］. Cluster Computing， 2023， 26（2）： 1053-1062. 10.1007/s10586-022-03700-5
16	施安妮，李陶深，王哲，等.基于缓存辅助的全双工无线携能通信系统的中继选择策略［J］. 计算机应用， 2021， 41（6）：1539-1545. 10.3969/j.issn.1000-1220.2021.09.018
	SHI A N， LI T S， WANG Z， et al. Relay selection strategy for cache-aided full-duplex simultaneous wireless information and power transfer system［J］. Journal of Computer Applications， 2021， 41（6）：1539-1545. 10.3969/j.issn.1000-1220.2021.09.018
17	陈艳，王子健，赵泽，等. 传感器网络环境监测时间序列数据的高斯过程建模与多步预测［J］. 通信学报， 2015， 36（10）： 252-262. 10.11959/j.issn.1000-436x.2015247
	CHEN Y， WANG Z J， ZHAO Z， et al. Gaussian process modeling and multi-step prediction for time series data in wireless sensor network environmental monitoring［J］. Journal on Communications， 2015， 36（10）： 252-262. 10.11959/j.issn.1000-436x.2015247
18	侯艳丽，苏佳，胡佳伟. 基于有限反馈机会波束的无线传感器网络［J］. 传感器与微系统， 2014， 33（2）： 57-60. 10.3969/j.issn.1000-9787.2014.02.016
	HOU Y L， SU J， HU J W. Wireless sensor networks based on finite feedback opportunistic beamforming［J］. Transducer and Microsystem Technologies， 2014， 33（2）： 57-60. 10.3969/j.issn.1000-9787.2014.02.016
19	DENT P， BOTTOMLEY G E， CROFT T. Jakes fading model revisited［J］. Electronics Letters， 1993， 29（13）：1162-1163. 10.1049/el:19930777
20	王强，王鸿. 智能反射面辅助的下行NOMA系统和速率最大化研究［J］. 南京邮电大学学报（自然科学版）， 2022， 42（1）： 23-29.
	WANG Q， WANG H. On sum rate maximization for IRS-aided downlink NOMA systems［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2022， 42（1）： 23-29.
21	吴毅凌，李红滨，赵玉萍. 一种适用于时不变信道的信道估计方法［J］. 高技术通讯， 2010， 20（1）： 1-7. 10.3772/j.issn.1002-0470.2010.01.001
	WU Y L， LI H B， ZHAO Y P. A novel channel estimation method for time-invariant channels［J］. Chinese High Technology Letters， 2010， 20（1）： 1-7. 10.3772/j.issn.1002-0470.2010.01.001
22	SEID A M， BOATENG G O， ANOKYE S， et al. Collaborative computation offloading and resource allocation in multi-UAV assisted IoT networks： a deep reinforcement learning approach［J］. IEEE Internet of Things Journal， 2021， 8（15）： 12203-12218. 10.1109/jiot.2021.3063188
23	罗斌，于波. 移动边缘计算中基于粒子群优化的计算卸载策略［J］. 计算机应用， 2020， 40（8）：2293-2298. 10.11772/j.issn.1001-9081.2019122200
	LUO B， YU B. Computation offloading strategy based on particle swarm optimization in mobile edge computing［J］. Journal of Computer Applications， 2020， 40（8）： 2293-2298. 10.11772/j.issn.1001-9081.2019122200
24	LUO Z Q， ZHANG S. Dynamic spectrum management： complexity and duality［J］. IEEE Journal of Selected Topics in Signal Processing， 2008， 2（1）： 57-73. 10.1109/jstsp.2007.914876
25	张淑兴，马驰，杨志学，等. 基于深度确定性策略梯度算法的风光储系统联合调度策略［J］. 中国电力， 2023， 56（2）： 68-76.
	ZHANG S X， MA C， YANG Z X， et al. Deep deterministic policy gradient algorithm based wind-photovoltaic-storage hybrid system joint dispatch［J］. Electric Power， 2023， 56（2）： 68-76.
26	韩佶，苗世洪， JON M R，等. 基于机群划分与深度强化学习的风电场低电压穿越有功/无功功率联合控制策略［J］. 中国电机工程学报， 2023， 43（11）： 4228-4244.
	HAN J， MIAO S H， JON M R， et al. Combined re/active power control for wind farm under low voltage ride through based on wind turbines grouping and deep reinforcement learning［J］. Proceedings of the CSEE， 2023， 43（11）： 4228-4244.
27	邓晖奕，李勇振，尹奇跃. 引入通信与探索的多智能体强化学习QMIX算法［J］. 计算机应用， 2023， 43（1）： 202-208.
	DENG H Y， LI Y Z， YIN Q Y. Improved QMIX algorithm from communication and exploration for multi-agent reinforcement learning［J］. Journal of Computer Applications， 2023， 43（1）： 202-208.
28	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［EB/OL］. ［2022-11-16］..
29	蒋宝庆，陈宏滨. 基于Q学习的无人机辅助WSN数据采集轨迹规划［J］. 计算机工程， 2021， 47（4）： 127-134， 165.
	JIANG B Q， CHEN H B. Trajectory planning for unmanned aerial vehicle assisted WSN data collection based on Q-learning［J］. Computer Engineering， 2021， 47（4）： 127-134， 165.
30	SUN H， CHEN X， SHI Q， et al. Learning to optimize： training deep neural networks for interference management［J］. IEEE Transactions on Signal Processing， 2018， 66（20）： 5438-5453. 10.1109/tsp.2018.2866382
31	李烨，肖梦巧. 大规模MIMO系统中功率分配的深度强化学习方法［J/OL］. 小型微型计算机系统（2022-08-01）［2022-11-16］..
	LI Y， XIAO M Q. Deep reinforcement learning approach for power allocation in massive MIMO systems［J/OL］. Journal of Chinese Computer Systems ［2022-11-16］..
32	张先超，赵耀，叶海军，等. 无线网络多用户干扰下智能发射功率控制算法［J］. 通信学报， 2022， 43（2）： 15-21. 10.11959/j.issn.1000-436x.2022028
	ZHANG X C， ZHAO Y， YE H J， et al. Intelligent transmit power control algorithm for the multi-user interference of wireless network［J］. Journal on Communications， 2022， 43（2）： 15-21. 10.11959/j.issn.1000-436x.2022028
33	陶丽佳，赵宜升，徐新雅. 无人机协助边缘计算的能量收集MEC系统资源分配策略［J］. 南京邮电大学学报（自然科学版）， 2022， 42（1）： 37-44.
	TAO L J， ZHAO Y S， XU X Y. Resource allocation strategy for UAV-assisted edge computing in energy harvesting MEC system［J］. Journal of Nanjing University of Posts and Telecommunications （Natural Science Edition）， 2022， 42（1）： 37-44.
34	SHEN K， YU W. Fractional programming for communication systems — Part I： power control and beamforming［J］. IEEE Transactions on Signal Processing， 2018， 66（10）： 2616-2630. 10.1109/tsp.2018.2812733

[1]	Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341.
[2]	Le YANG, Damin ZHANG, Qing HE, Jiaxin DENG, Fengqin ZUO. Application of improved hunter-prey optimization algorithm in WSN coverage [J]. Journal of Computer Applications, 2024, 44(8): 2506-2513.
[3]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[4]	Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510.
[5]	Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network [J]. Journal of Computer Applications, 2024, 44(5): 1562-1569.
[6]	Han SHEN, Zhongsheng WANG, Zhou ZHOU, Changyuan WANG. Improved DV-Hop localization model based on multi-scenario [J]. Journal of Computer Applications, 2024, 44(4): 1219-1227.
[7]	Xintong QIN, Zhengyu SONG, Tianwei HOU, Feiyue WANG, Xin SUN, Wei LI. Channel access and resource allocation algorithm for adaptive p-persistent mobile ad hoc network [J]. Journal of Computer Applications, 2024, 44(3): 863-868.
[8]	Yuanchao LI, Chongben TAO, Chen WANG. Gait control method based on maximum entropy deep reinforcement learning for biped robot [J]. Journal of Computer Applications, 2024, 44(2): 445-451.
[9]	Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG. Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism [J]. Journal of Computer Applications, 2024, 44(2): 432-438.
[10]	Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm [J]. Journal of Computer Applications, 2024, 44(11): 3629-3638.
[11]	Jie LONG, Liang XIE, Haijiao XU. Integrated deep reinforcement learning portfolio model [J]. Journal of Computer Applications, 2024, 44(1): 300-310.
[12]	Dahai LI, Meixin ZHAN, Zhendong WANG. Enhanced sparrow search algorithm based on multiple improvement strategies [J]. Journal of Computer Applications, 2023, 43(9): 2845-2854.
[13]	Yu WANG, Tianjun REN, Zilin FAN. Air combat maneuver decision-making of unmanned aerial vehicle based on guided Minimax-DDQN [J]. Journal of Computer Applications, 2023, 43(8): 2636-2643.
[14]	Ziteng WANG, Yaxin YU, Zifang XIA, Jiaqi QIAO. Sparse reward exploration mechanism fusing curiosity and policy distillation [J]. Journal of Computer Applications, 2023, 43(7): 2082-2090.
[15]	Wanzhen CHEN, En ZHANG, Leiyong QIN, Shuangxi HONG. Privacy-preserving federated learning algorithm based on blockchain in edge computing [J]. Journal of Computer Applications, 2023, 43(7): 2209-2216.

Joint optimization method for SWIPT edge network based on deep reinforcement learning

基于深度强化学习的SWIPT边缘网络联合优化方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 34

Related Articles 15

Recommended Articles

Metrics