D2D通信增强的蜂窝网络中基于DDPG的资源分配

doi:10.11772/j.issn.1001-9081.2023050612

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1562-1569.DOI: 10.11772/j.issn.1001-9081.2023050612

• 网络与通信 • 上一篇

D2D通信增强的蜂窝网络中基于DDPG的资源分配

唐睿¹^,²(), 庞川林², 张睿智³, 刘川¹, 岳士博²

^1.西华师范大学电子信息工程学院, 四川南充 637002
^2.成都理工大学计算机与网络安全学院, 成都 610059
^3.电子科技大学信息与通信工程学院, 成都 611731

收稿日期:2023-05-22 修回日期:2023-08-02 接受日期:2023-08-08 发布日期:2023-08-10 出版日期:2024-05-10
通讯作者: 唐睿
作者简介:庞川林（1997—），男，四川南充人，硕士研究生，主要研究方向：深度强化学习
张睿智（1999—），男，山东济宁人，博士研究生，主要研究方向：泛化优化算法
刘川（1991—），男，四川南充人，讲师，硕士，主要研究方向：无线通信
岳士博（2000—），男，四川巴中人，硕士研究生，CCF会员，主要研究方向：无人机通信中资源分配。
第一联系人：唐睿（1988—），男，甘肃兰州人，副教授，博士，CCF会员，主要研究方向：无线通信
基金资助:
国家自然科学基金资助项目(62301450);四川省科技厅自然科学基金资助项目(24NSFSC5070);四川省科技厅区域创新合作项目(2022YFQ0017);成都理工大学基本科研业务费资金资助项目(10912?KYQD2019_08164)

DDPG-based resource allocation in D2D communication-empowered cellular network

Rui TANG¹^,²(), Chuanlin PANG², Ruizhi ZHANG³, Chuan LIU¹, Shibo YUE²

^1.School of Electronic Information Engineering，China West Normal University，Nanchong Sichuan 637002，China
^2.College of Computer Science and Cyber Security，Chengdu University of Technology，Chengdu Sichuan 610059，China
^3.School of Information and Communication Engineering，University of Electronic Science and Technology of China，Chengdu Sichuan 611731，China

Received:2023-05-22 Revised:2023-08-02 Accepted:2023-08-08 Online:2023-08-10 Published:2024-05-10
Contact: Rui TANG
About author:PANG Chuanlin， born in 1997， M. S. candidate. His research interests include deep reinforcement learning.
ZHANG Ruizhi， born in 1999， Ph. D. candidate. His research interests include generalized optimization algorithms.
LIU Chuan， born in 1991， M. S.， lecturer. His research interests include wireless communication.
YUE Shibo， born in 2000， M. S. candidate. His research interests include resource allocation in unmanned aerial vehicular communication.
Supported by:
National Natural Science Foundation of China(62301450);Sichuan Provincial Natural Science Foundation(24NSFSC5070);Sichuan Provincial Regional Innovation Cooperation Project(2022YFQ0017);Fundamental Research Funds of Chengdu University of Technology(10912-KYQD2019_08164)

摘要/Abstract

摘要：

针对终端直通（D2D）通信增强的蜂窝网络中存在的同频干扰，通过联合调控信道分配和功率控制最大化D2D链路和速率，并同时满足功率约束和蜂窝链路的服务质量（QoS）需求。为有效求解上述资源分配所对应的混合整数非凸规划问题，将原问题转化为马尔可夫决策过程，并提出一种基于深度确定性策略梯度（DDPG）算法的机制。通过离线训练，直接构建了从信道状态信息到最佳资源分配策略的映射关系，而且无需求解任何优化问题，因此可通过在线方式部署。仿真结果表明，相较于遍历搜索机制，所提机制在仅损失9.726%性能的情况下将运算时间降低了4个数量级（99.51%）。

关键词: 终端直通通信, 资源分配, 马尔可夫决策过程, 深度强化学习, 深度确定性策略梯度算法

Abstract:

To deal with the co-channel interference in Device-to-Device （D2D） communication-empowered cellular networks， the sum rate of D2D links was maximized through joint channel allocation and power control while satisfying the power constraints and the Quality-of-Service （QoS） requirements of cellular links. In order to efficiently solve the mixed-integer non-convex programming problem corresponding to the above resource allocation， the original problem was transformed into a Markov decision process， and a Deep Deterministic Policy Gradient （DDPG） algorithm-based mechanism was proposed. Through offline training， the mapping relationship from the channel state information to the optimal resource allocation policy was directly built up without solving any optimization problems， so it could be deployed in an online fashion. Simulation results show that compared with the exhausting search-based mechanism， the proposed mechanism reduces the computation time by 4 orders of magnitude （99.51%） at the cost of only 9.726% performance loss.

Key words: Device-to-Device (D2D) communication, resource allocation, Markov decision process, deep reinforcement learning, Deep Deterministic Policy Gradient (DDPG) algorithm

中图分类号:

TP393

唐睿, 庞川林, 张睿智, 刘川, 岳士博. D2D通信增强的蜂窝网络中基于DDPG的资源分配[J]. 计算机应用, 2024, 44(5): 1562-1569.

Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network[J]. Journal of Computer Applications, 2024, 44(5): 1562-1569.

图/表 9

图1 D2D通信增强的上行蜂窝网络

Fig. 1 D2D communication-empowered uplink cellular network

图2 强化学习模型

Fig. 2 Reinforcement learning model

图3 基于DDPG算法的离线训练架构

Fig. 3 Offline training framework based on DDPG algorithm

图4 全连接前馈DNN结构

Fig. 4 Structure of fully connected feed-forward DNN

表1 离线训练中超参数的设置

Tab. 1 Hyperparameters setting in offline training

超参数	数值
训练轮数R	2 000
每轮迭代次数T	100
经验回放池大小 $N E$	5 000
mini-batch的大小 $N B$	128
折扣系数 $γ$	0.9
目标网络软更新系数 $τ$	0.01
策略网络/目标策略网络中隐藏层层数	2
策略网络/目标策略网络中隐藏层神经元数	（128，64）
Q值网络/目标Q值网络中隐藏层层数	2
Q值网络/目标Q值网络中隐藏层神经元数	（128，64）
隐藏层激活函数	ReLU
输出层激活函数	sigmoid
优化器	Adam

表1 离线训练中超参数的设置

Tab. 1 Hyperparameters setting in offline training

超参数	数值
训练轮数R	2 000
每轮迭代次数T	100
经验回放池大小 $N E$	5 000
mini-batch的大小 $N B$	128
折扣系数 $γ$	0.9
目标网络软更新系数 $τ$	0.01
策略网络/目标策略网络中隐藏层层数	2
策略网络/目标策略网络中隐藏层神经元数	（128，64）
Q值网络/目标Q值网络中隐藏层层数	2
Q值网络/目标Q值网络中隐藏层神经元数	（128，64）
隐藏层激活函数	ReLU
输出层激活函数	sigmoid
优化器	Adam

图5 本文离线训练机制在不同学习率的收敛性（M=6）

Fig. 5 Convergence of proposed offline training mechanism under different learning rates （M=6）

图6 在线部署中最佳策略网络的平均收敛性（M=6）

Fig. 6 Average convergence of best policy network in online deployment （M=6）

图7 D2D通信对总传输速率随D2D对数的变化

Fig.7 Variation of sum transmission rate with number of D2D pairs

表2 运算时间随D2D对数的变化 (s)

Tab.2 Variation of operation time with number of D2D pairs

D2D对数 $(M)$	对比机制3	对比机制4	本文机制
3	0.034 6	2.146	0.032 9
4	0.035 3	3.859	0.034 3
5	0.036 9	8.257	0.035 2
6	0.037 3	16.359	0.036 7
7	0.037 8	30.436	0.037 5

表2 运算时间随D2D对数的变化 (s)

Tab.2 Variation of operation time with number of D2D pairs

D2D对数 $(M)$	对比机制3	对比机制4	本文机制
3	0.034 6	2.146	0.032 9
4	0.035 3	3.859	0.034 3
5	0.036 9	8.257	0.035 2
6	0.037 3	16.359	0.036 7
7	0.037 8	30.436	0.037 5

参考文献 31

1	ASADI A， WANG Q， MANCUSO V. A survey on device-to-device communication in cellular networks［J］. IEEE Communications Surveys & Tutorials， 2014， 16（4）： 1801-1819. 10.1109/comst.2014.2319555
2	SHEN Q， SHAO W， FU X. D2D relay incenting and charging modes that are commercially compatible with B2D services［J］. IEEE Access， 2019， 7： 36446-36458. 10.1109/access.2019.2904090
3	HASHIM M F， ABDUL RAZAK N I. Ultra-dense networks： integration with device to device （D2D） communication［J］. Wireless Personal Communications， 2019， 106（2）： 911-925. 10.1007/s11277-019-06195-3
4	PAWAR P， TRIVEDI A. Device-to-device communication based IoT system： benefits and challenges［J］. IETE Technical Review， 2019， 36（4）： 362-374. 10.1080/02564602.2018.1476191
5	李余，何希平，唐亮贵. 基于终端直通通信的多用户计算卸载资源优化决策［J］. 计算机应用， 2022， 42（5）： 1538-1546. 10.11772/j.issn.1001-9081.2021030458
	LI Y， HE X P， TANG L G. Multi-user computation offloading and resource optimization policy based on device-to-device communication［J］. Journal of Computer Applications， 2022， 42（5）： 1538-1546. 10.11772/j.issn.1001-9081.2021030458
6	TANG R， ZHAO J， QU H， et al. User-centric joint admission control and resource allocation for 5G D2D extreme mobile broadband： a sequential convex programming approach［J］. IEEE Communications Letters， 2017， 21（7）： 1641-1644. 10.1109/lcomm.2017.2681664
7	尼俊红，申振涛，杨会峰. 蜂窝网络下基于max-min公平性的D2D功率分配［J］. 计算机应用， 2017， 37（4）： 945-947. 10.11772/j.issn.1001-9081.2017.04.0945
	NI J H， SHEN Z T， YANG H F. D2D power allocation based on max-min fairness underlying cellular systems［J］. Journal of Computer Applications， 2017， 37（4）： 945-947. 10.11772/j.issn.1001-9081.2017.04.0945
8	LYU J， CHEW Y H， W-C WONG. A Stackelberg game model for overlay D2D transmission with heterogeneous rate requirements［J］. IEEE Transactions on Vehicular Technology， 2016， 65（10）： 8461-8475. 10.1109/tvt.2015.2511924
9	YANG Z-Y， Y-W KUO. Efficient resource allocation algorithm for overlay D2D communication［J］. Computer Networks， 2017， 124： 61-71. 10.1016/j.comnet.2017.06.002
10	SWAIN S N， MISHRA S， MURTHY C S R. A novel spectrum reuse scheme for interference mitigation in a dense overlay D2D network ［C］// Proceedings of the 2015 IEEE 26th Annual International Symposium on Personal， Indoor， and Mobile Radio Communications. Piscataway： IEEE， 2015： 1201-1205. 10.1109/pimrc.2015.7343481
11	李中捷，谢东朋.异构蜂窝网络中联合功率控制的终端直通通信资源分配［J］. 计算机应用， 2018， 38（9）： 2610-2615.
	LI Z J， XIE D P. Joint power controlled resource allocation scheme for device-to-device communication in heterogeneous cellular networks［J］. Journal of Computer Applications， 2018， 38（9）： 2610-2615.
12	ZAPPONE A， DI RENZO M， DEBBAH M. Wireless networks design in the era of deep learning： model-based， AI-based， or both？［J］. IEEE Transactions on Communications， 2019， 67（10）： 7331-7376. 10.1109/tcomm.2019.2924010
13	ZHAO N， LIANG Y-C， NIYATO D， et al. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks［J］. IEEE Transactions on Wireless Communications， 2019， 18（11）： 5141-5152. 10.1109/twc.2019.2933417
14	NASIR Y S， GUO D. Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks［J］. IEEE Journal on Selected Areas in Communications， 2019， 37（10）： 2239-2250. 10.1109/jsac.2019.2933973
15	TAN J， LIANG Y-C， ZHANG L， et al. Deep reinforcement learning for joint channel selection and power control in D2D networks［J］. IEEE Transactions on Wireless Communications， 2021， 20（2）： 1363-1378. 10.1109/twc.2020.3032991
16	LEE H-S. Channel metamodeling for explainable data-driven channel model［J］. IEEE Wireless Communications Letters， 2021， 10（12）： 2678-2682. 10.1109/lwc.2021.3111874
17	SHEN K， YU W. Fractional programming for communication systems — Part I： power control and beamforming［J］. IEEE Transactions on Signal Processing， 2018， 66（10）： 2616-2630. 10.1109/tsp.2018.2812733
18	LUO Z-Q， ZHANG S. Dynamic spectrum management： complexity and duality［J］. IEEE Journal of Selected Topics in Signal Processing， 2008， 2（1）： 57-73. 10.1109/jstsp.2007.914876
19	马礼智，唐睿，张睿智，等.基于无线能量传输的物联网数据采集系统中资源分配机制的设计［J］.信息与控制，2023，52（2）：220-234. 10.13976/j.cnki.xk.2023.2034
	MA L Z， TANG R， ZHANG R Z， et al. Design of resource allocation mechanisms for wireless power transfer-based Internet-of-things data collection system［J］. Information and Control， 2023， 52（2）： 220-234. 10.13976/j.cnki.xk.2023.2034
20	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of go with deep neural networks and tree search［J］. Nature， 2016， 529（7587）： 484-489. 10.1038/nature16961
21	TANG R， ZHANG R， XU Y， et al. Energy-efficient optimization algorithm in NOMA-based UAV-assisted data collection systems［J］. IEEE Wireless Communications Letters， 2023， 12（1）： 158-162. 10.1109/lwc.2022.3219675
22	ZHANG R， TANG R， XU Y， et al. Resource allocation for UAV-assisted NOMA systems with dual connectivity［J］. IEEE Wireless Communications Letters， 2023， 12（2）： 341-345. 10.1109/lwc.2022.3226265
23	KIRAN B R， SOBH I， TALPAERT V， et al. Deep reinforcement learning for autonomous driving： a survey［J］. IEEE Transactions on Intelligent Transportation Systems， 2022， 23（6）： 4909-4926. 10.1109/tits.2021.3054625
24	MABU S， HATAKEYAMA H， HIRASAWA K， et al. Genetic network programming with reinforcement learning using SARSA algorithm ［C］// Proceedings of the 2006 IEEE International Conference on Evolutionary Computation. Piscataway： IEEE， 2006： 463-469.
25	KIUMARSI B， LEWIS F L， MODARES H， et al. Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics［J］. Automatica， 2014， 50（4）： 1167-1175. 10.1016/j.automatica.2014.02.015
26	ALZUBAIDI L， ZHANG J， HUMAIDI A J， et al. Review of deep learning： concepts， CNN architectures， challenges， applications， future directions［J］. Journal of Big Data， 2021， 8： No. 53. 10.1186/s40537-021-00444-8
27	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［EB/OL］. ［2023-05-01］. .
28	LESHNO M， LIN V Y， PINKUS A， et al. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function［J］. Neural Networks， 1993， 6（6）： 861-867. 10.1016/s0893-6080(05)80131-5
29	KINGMA D P， BA J. Adam： a method for stochastic optimization［EB/OL］. ［2023-05-01］. .
30	FRANÇOIS-LAVET V， HENDERSON P， ISLAM R， et al. An introduction to deep reinforcement learning［J］. Foundations & Trends in Machine Learning， 2018， 11（3/4）： 219-354. 10.1561/2200000071
31	HERBERT S， WASSELL I， T-H LOH， et al. Characterizing the spectral properties and time variation of the in-vehicle wireless communication channel［J］. IEEE Transactions on Communications， 2014， 62（7）： 2390-2399. 10.1109/TCOMM.2014.2328635

[1]	张俊娜, 王欣新, 李天泽, 赵晓焱, 袁培燕. 基于动态服务缓存辅助的任务卸载方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1493-1500.
[2]	赵晓焱, 韩威, 张俊娜, 袁培燕. 基于异步深度强化学习的车联网协作卸载策略[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1501-1510.
[3]	罗华亮, 李全忠, 张旗. 融合信息通信和空中计算的认知无线网络鲁棒资源分配优化[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1195-1202.
[4]	陈发堂, 黄淼, 金宇峰. 面向用户需求的低轨卫星资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1242-1247.
[5]	唐睿, 岳士博, 张睿智, 刘川, 庞川林. UAV协助下非正交多址接入使能的数据采集系统中能效优化机制[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1209-1218.
[6]	秦鑫彤, 宋政育, 侯天为, 王飞越, 孙昕, 黎伟. 基于自适应p持续的移动自组网信道接入和资源分配算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 863-868.
[7]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 445-451.
[8]	邓辅秦, 官桧锋, 谭朝恩, 付兰慧, 王宏民, 林天麟, 张建民. 基于请求与应答通信机制和局部注意力机制的多机器人强化学习路径规划方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 432-438.
[9]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[10]	马勇健, 史旭华, 王佩瑶. 基于两阶段搜索与动态资源分配的约束多目标进化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 269-277.
[11]	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2636-2643.
[12]	王子腾, 于亚新, 夏子芳, 乔佳琪. 融合好奇心和策略蒸馏的稀疏奖励探索机制[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2082-2090.
[13]	李磊, 张国富, 苏兆品, 岳峰. 体系结构动态变化的软件测试资源分配算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2261-2270.
[14]	李校林, 江雨桑. 无人机辅助移动边缘计算中的任务卸载算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1893-1899.
[15]	方和平, 刘曙光, 冉泳屹, 钟坤华. 基于深度强化学习的多数据中心一体化调度优化[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1884-1892.

D2D通信增强的蜂窝网络中基于DDPG的资源分配

DDPG-based resource allocation in D2D communication-empowered cellular network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 31

相关文章 15

编辑推荐

Metrics