集成的深度强化学习投资组合模型

doi:10.11772/j.issn.1001-9081.2023010028

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (1): 300-310.DOI: 10.11772/j.issn.1001-9081.2023010028

• 前沿与综合应用 • 上一篇

集成的深度强化学习投资组合模型

龙杰¹, 谢良¹(), 徐海蛟²

^1.武汉理工大学理学院，武汉 430070
^2.广东第二师范学院计算机学院，广州 510303

收稿日期:2023-01-11 修回日期:2023-04-22 接受日期:2023-04-24 发布日期:2023-06-06 出版日期:2024-01-10
通讯作者: 谢良
作者简介:龙杰（1996—），男，四川遂宁人，硕士研究生，主要研究方向：深度学习、数据挖掘；
徐海蛟（1972—），男，湖南常德人，高级工程师，博士，主要研究方向：大数据、深度学习、软件工程。
第一联系人：谢良（1987—），男，湖北荆州人，副教授，博士，主要研究方向：机器学习、数据挖掘、多媒体检索；
基金资助:
广东省自然科学基金资助项目(2020A1515011208);广州市基础研究教育计划基础与应用基础研究项目(202102080353);广东省普通高校自然科学类特色创新项目(2019KTSCX117)

Integrated deep reinforcement learning portfolio model

Jie LONG¹, Liang XIE¹(), Haijiao XU²

^1.College of Science，Wuhan University of Technology，Wuhan Hubei 430070，China
^2.College of Computer Science，Guangdong University of Education，Guangzhou Guangdong 510303，China

Received:2023-01-11 Revised:2023-04-22 Accepted:2023-04-24 Online:2023-06-06 Published:2024-01-10
Contact: Liang XIE
About author:LONG Jie， born in 1996， M. S. candidate. His research interests include deep learning， data mining.
XU Haijiao， born in 1972， Ph. D.， senior engineer. His research interests include big data， deep learning， software engineering.
Supported by:
Natural Science Foundation of Guangdong Province(2020A1515011208);Basic and Applied Basic Research Project of Guangzhou Basic Research Education Plan(202102080353);Natural Science Characteristic Innovation Project of Ordinary Colleges and Universities in Guangdong Province(2019KTSCX117)

摘要/Abstract

摘要：

投资组合问题是量化交易领域中的热点问题。针对现有基于深度强化学习的投资组合模型无法实现自适应的交易策略和有效利用有监督信息的缺陷，提出一种集成的深度强化学习投资组合模型（IDRLPM）。首先，采用多智能体方法构造多个基智能体并设计不同交易风格的奖励函数，以表示不同的交易策略；其次，利用集成学习方法对基智能体的策略网络进行特征融合，得到自适应市场环境的集成智能体；然后，在集成智能体中嵌入基于卷积块注意力模块（CBAM）的趋势预测网络，趋势预测网络输出引导集成策略网络自适应选择交易比重；最后，在有监督深度学习和强化学习交替迭代训练下，IDRLPM有效利用训练数据中的监督信息以增强模型盈利能力。在上证50的成分股和中证500的成分股数据集中，IDRLPM的夏普比率（SR）达到了1.87和1.88，累计收益（CR）达到了2.02和1.34；相较于集合式的深度强化学习（EDRL）交易模型，SR提高了105%和55%，CR提高了124%和79%。实验结果表明，IDRLPM能够有效解决投资组合问题。

关键词: 深度强化学习, 投资组合模型, 集成学习, 卷积块注意力模块, 趋势预测

Abstract:

The portfolio problem is a hot issue in the field of quantitative trading. An Integrated Deep Reinforcement Learning Portfolio Model （IDRLPM） was proposed to address the shortcomings of existing deep reinforcement learning-based portfolio models that cannot achieve adaptive trading strategies and effectively utilize supervised information. Firstly， multi-agent method was used to construct multiple base agents and design reward functions with different trading styles to represent different trading strategies. Secondly， integrated learning method was used to fuse the features of strategy network of the base agents to obtain the integrated agent adaptive to market environment. Then， a trend prediction network based on Convolutional Block Attention Module （CBAM） was embedded in the integrated agent， and the output of the trend prediction network guided integrated strategy network to adaptively select the proportion of trades. Finally， under the alternating iterative training of supervised deep learning and reinforcement learning， IDRLPM effectively utilized supervised information from training data to enhance model profitability. The Sharpe Ratio （SR） of IDRLPM reaches 1.87 and 1.88， and the Cumulative Return （CR） reaches 2.02 and 1.34 in Shanghai Stock Exchange （SSE） 50 constituent stocks and China Securities Index （CSI） 500 constituent stocks； compared with the Ensemble Deep Reinforcement Learning （EDRL） trading model， the SR improves by 105% and 55%， and the CR improves by 124% and 79%. The experimental results show that IDRLPM can effectively solve the portfolio problem.

Key words: deep reinforcement learning, portfolio model, integrated learning, Convolutional Block Attention Module (CBAM), trend prediction

中图分类号:

TP181

龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 计算机应用, 2024, 44(1): 300-310.

Jie LONG, Liang XIE, Haijiao XU. Integrated deep reinforcement learning portfolio model[J]. Journal of Computer Applications, 2024, 44(1): 300-310.

图/表 19

图1 IDRLPM整体架构

Fig. 1 Overall framework of IDRLPM

图2 DC方法示例

Fig. 2 Example of DC method

图3 趋势预测网络结构

Fig. 3 Trend prediction network structure

图4 基智能体网络结构

Fig. 4 Network structure of base agent

图5 集成模块结构

Fig. 5 Integrated module structure

表1 IDRLPM实验参数

Tab. 1 IDRLPM experimental parameters

参数	取值
DC阈值 $λ$	｛0.015，0.01，0.005｝
$e p i s o d e s$	200
采样步长 $s t e p s$	2 400
抽取的样本数量 $T$ （batch size）	64
$b u f f e r$ 容量	2 400
截断范围 $ε$	0.2
折扣系数 $γ$	0.99
$D S R$ 适应率 $η$	0.001
初始资金 $b 0$	1 000 000
交易费率 $τ$	0.001 5
观测历史信息长度 $L$	10

表1 IDRLPM实验参数

Tab. 1 IDRLPM experimental parameters

参数	取值
DC阈值 $λ$	｛0.015，0.01，0.005｝
$e p i s o d e s$	200
采样步长 $s t e p s$	2 400
抽取的样本数量 $T$ （batch size）	64
$b u f f e r$ 容量	2 400
截断范围 $ε$	0.2
折扣系数 $γ$	0.99
$D S R$ 适应率 $η$	0.001
初始资金 $b 0$	1 000 000
交易费率 $τ$	0.001 5
观测历史信息长度 $L$	10

表2 部分股票代码和企业名称

Tab. 2 Partial stock symbols and business names

分类	股票代码	企业名称
上证50部分成分股	600809	山西汾酒
	600585	海螺水泥
	603259	药明康德
	600000	浦发银行
	601899	紫金矿业
	600050	中国联通
中证500部分成分股	000988	华工科技
	000987	越秀资本
	000983	山西焦煤
	000528	柳工
	300070	碧水源
	300155	安居宝

表3 实验数据字段

Tab. 3 Experimental data fields

字段名称	说明
close	收盘价
volume	成交量
OBV	能量潮指标
MACD	平滑移动平均线指标
WPR	威廉姆斯百分比区间指标
TSI	真实强度指数指标
AO	动量震荡指标
RSI	相对强弱指标
VWMA	成交量加权移动平均指标

表4 数据集划分统计

Tab. 4 Statistics of dataset division

数据集		训练集		验证集		测试集
数据集		时间段	样本数	时间段	样本数	时间段	样本数
上证50成分股	1	2012/01—2017/12	1 457	2018/01—06	120	2018/07—12	124
	2	2012/01—2018/06	1 577	2018/07—12	124	2019/01—06	118
	3	2012/01—2018/12	1 701	2019/01—06	118	2019/07—12	126
	4	2012/01—2019/06	1 819	2019/07—12	126	2020/01—06	117
	5	2012/01—2019/12	1 945	2020/01—06	117	2020/07—12	126
	6	2012/01—2020/06	2 062	2020/07—12	126	2021/01—09	182
中证500成分股	1	2012/01—2018/12	1 659	2019/01—06	120	2019/07—12	128
	2	2012/01—2019/06	1 779	2019/07—12	128	2020/01—06	117
	3	2012/01—2018/12	1 907	2020/01—06	117	2020/07—12	126
	4	2012/01—2019/06	2 024	2020/07—12	126	2021/01—06	119
	5	2012/01—2019/12	2 150	2021/01—06	119	2021/07—11	95

图6 不同模型的累计收益对比

Fig. 6 Comparison of cumulative returns of different models

表5 不同模型的评价指标对比

Tab. 5 Comparison of evaluation indicators of different models

模型	上证50成分股				中证500成分股
模型	SR	MDD	CR	ARR	SR	MDD	CR	ARR
Buy&Hold	0.40	0.29	0.33	0.09	0.42	0.16	0.40	0.12
Mean-Variance^［1］	1.18	0.17	0.36	0.09	0.87	0.10	0.26	0.10
PPO^［13］	1.42	0.09	0.71	0.10	1.01	0.15	0.45	0.17
EDRL^［16］	0.91	0.55	0.90	0.19	1.21	0.21	0.75	0.28
IDRLPM	1.87	0.14	2.02	0.47	1.88	0.12	1.34	0.44

表6 不同模型在滚动测试阶段的SR指标对比

Tab. 6 Comparison of SR indicator among different models in rolling test stage

测试时间段	上证50成分股
测试时间段	Buy&Hold	Mean-Variance^［1］	PPO^［13］	EDRL^［16］	IDRLPM
2018/07—2018/12	0.13	0.12	0.17	0.19	0.30
2019/01—2019/06	0.11	0.09	0.13	0.17	0.22
2019/07—2019/12	0.06	0.07	0.14	0.14	0.17
2020/01—2020/06	0.02	0.06	0.12	0.26	0.31
2020/07—2020/12	0.01	0.03	0.11	0.14	0.31
2021/01—2021/09	-0.01	0.06	0.09	-0.06	0.10
测试时间段	中证500成分股
测试时间段	Buy&Hold	Mean-Variance^［1］	PPO^［13］	EDRL^［16］	IDRLPM
2019/07—2019/12	0.07	0.17	0.23	0.24	0.27
2020/01—2020/06	0.06	0.08	0.14	0.17	0.20
2020/07—2020/12	0.12	0.14	0.09	0.26	0.25
2021/01—2021/06	0.02	0.04	0.05	0.24	0.27
2021/07—2021/11	0.04	-0.08	0.02	0.04	0.11

图7 IDRLPM集成模块消融实验结果对比

Fig. 7 Ablation experiment result comparison of IDRLPM integrated module

表7 IDRLPM集成模块消融实验的评价指标对比

Tab. 7 Comparison of evaluation indexes of IDRLPM integrated module ablation experiments

模型	上证50成分股				中证500成分股
模型	SR	MDD	CR	ARR	SR	MDD	CR	ARR
Buy&Hold	0.40	0.29	0.33	0.09	0.42	0.16	0.40	0.12
IDRLPM-rad	1.47	0.17	1.38	0.36	1.35	0.25	1.05	0.32
IDRLPM-mid	1.56	0.14	1.26	0.31	1.41	0.15	0.62	0.23
IDRLPM-con	1.17	0.13	0.71	0.29	1.60	0.10	0.56	0.20
IDRLPM-mean	1.22	0.18	1.15	0.30	1.29	0.17	0.84	0.27
IDRLPM	1.87	0.14	2.02	0.47	1.88	0.12	1.34	0.44

图8 IDRLPM趋势预测模块消融实验结果对比

Fig. 8 Ablation experiment result comparison of IDRLPM trend prediction module

表8 IDRLPM趋势预测模块消融实验评价指标对比

Tab. 8 Comparison of evaluation indexes of IDRLPM trend prediction module ablation experiments

模型	上证50成分股				中证500成分股
模型	SR	MDD	CR	ARR	SR	MDD	CR	ARR
Buy&Hold	0.40	0.29	0.33	0.09	0.42	0.16	0.40	0.12
IDRLPM-DC	1.25	0.15	0.89	0.30	1.38	0.16	0.83	0.29
IDRLPM-ECANet	1.71	0.13	1.75	0.39	1.73	0.15	1.28	0.41
IDRLPM-SENet	1.70	0.14	1.73	0.37	1.81	0.15	1.33	0.43
IDRLPM	1.87	0.14	2.02	0.47	1.88	0.12	1.34	0.44

图9 不同模型在市场高风险形势下的对比

Fig. 9 Comparison of different models in high-risk market situations

图10 在上证50成分股下的模型交易实例分析

Fig. 10 Example analysis of model trading under Shanghai Stock Exchange （SSE） 50 constituent stocks

图11 在中证500成分股下的模型交易实例分析

Fig. 11 Example analysis of model trading under China Securities Index （CSI） 500 constituent stocks

参考文献 38

1	MARKOWITS H M. Portfolio selection ［J］. The Journal of Finance， 1952， 7（1）： 71-91. 10.1111/j.1540-6261.1952.tb01525.x
2	SHARPE W F. Capital asset prices： a theory of market equilibrium under conditions of risk ［J］. The Journal of Finance， 1964， 19（3）： 425-442. 10.1111/j.1540-6261.1964.tb02865.x
3	LIU G， MAO Y， SUN Q， et al. Multi-scale two-way deep neural network for stock trend prediction ［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2020： 4555-4561. 10.24963/ijcai.2020/628
4	SHARMA M， SHEKHAWAT H S. Portfolio optimization and return prediction by integrating modified deep belief network and recurrent neural network ［J］. Knowledge-Based Systems， 2022， 250： 109024. 10.1016/j.knosys.2022.109024
5	ZHANG R， YUAN Z， SHAO L. A new combined CNN-RNN model for sector stock price analysis ［C］// Proceedings of the 2018 IEEE 42nd Annual Computer Software and Applications Conference. Piscataway： IEEE， 2018： 546-551. 10.1109/compsac.2018.10292
6	LIN Y-F， HUANG T-M， W-H CHUNG， et al. Forecasting fluctuations in the financial index using a recurrent neural network based on price features ［J］. IEEE Transactions on Emerging Topics in Computational Intelligence， 2020， 5（5）： 780-791. 10.1109/tetci.2020.2971218
7	MA Y， HAN R， WANG W. Portfolio optimization with return prediction using deep learning and machine learning ［J］. Expert Systems with Applications， 2021， 165： 113973. 10.1016/j.eswa.2020.113973
8	OSBAND I， BLUNDELL C， PRITZEL A， et al. Deep exploration via bootstrapped DQN ［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 4033-4041.
9	YUAN Y， YU Z L， GU Z， et al. A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning ［J］. Knowledge-Based Systems， 2019， 175： 107-117. 10.1016/j.knosys.2019.03.018
10	CARTA S， FERREIRA A， PODDA A S， et al. Multi-DQN： an ensemble of deep Q-learning agents for stock market forecasting ［J］. Expert Systems with Applications， 2021， 164： 113820. 10.1016/j.eswa.2020.113820
11	WANG H Z， WU Y L， MIN G Y， et al. Data-driven dynamic resource scheduling for network slicing： A deep reinforcement learning approach ［J］. Information Sciences， 2019， 498： 106-116. 10.1016/j.ins.2019.05.012
12	LOWE R， WU Y， TAMAR A， et al. Multi-agent actor-critic for mixed cooperative-competitive environments ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6382-6393.
13	LIN S Y， BELING P A. An end-to-end optimal trade execution framework based on proximal policy optimization ［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2021： 4548-4554. 10.24963/ijcai.2020/627
14	LEE J， KIM R， YI S-W， et al. MAPS： Multi-agent reinforcement learning-based portfolio management system ［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. San Francisco， CA： Morgan Kaufmann Publishers Inc.， 2020： 4520-4526. 10.24963/ijcai.2020/623
15	LIN Y-C， CHEN C-T， SANG C-Y， et al. Multi-agent based deep reinforcement learning for risk-shifting portfolio management ［J］. Applied Soft Computing， 2022， 123（C）： 108894. 10.1016/j.asoc.2022.108894
16	YANG H Y， LIU X-Y， ZHONG S H， et al. Deep reinforcement learning for automated stock trading： an ensemble strategy ［C］// Proceedings of the 1st ACM International Conference on AI in Finance. New York： ACM， 2020， 31： Article No. 31. 10.1145/3383455.3422540
17	LEI K， ZHANG B， LI Y， et al. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading ［J］. Expert Systems with Applications， 2020， 140： 112872. 10.1016/j.eswa.2019.112872
18	WANG J， JING F， HE M. Stock trading strategy of reinforcement learning driven by turning point classification ［J］. Neural Processing Letters， 2023， 55： 3489-3508. 10.1007/s11063-022-11019-w
19	YE Y， PEI H， WANG B， et al. Reinforcement-learning based portfolio management with augmented asset movement prediction states ［C］// Proceedings of the 34th International Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 1112-1119. 10.1609/aaai.v34i01.5462
20	BUROV S， TABEI S M A， HUYNH T， et al. Distribution of directional change as a signature of complex dynamics ［J］. Proceedings of the National Academy of Sciences， 2013， 110（49）： 19689-19694. 10.1073/pnas.1319473110
21	LIANG Y， LIN Y， LU Q. Forecasting gold price using a novel hybrid model with ICEEMDAN and LSTM-CNN-CBAM ［J］. Expert Systems with Applications， 2022， 206： 117847. 10.1016/j.eswa.2022.117847
22	LONG J， CHEN Z， HE W， et al. An integrated framework of deep learning and knowledge graph for prediction of stock price trend： an application in Chinese stock exchange market ［J］. Applied Soft Computing， 2020， 91： 106205. 10.1016/j.asoc.2020.106205
23	曹超凡，罗泽南，谢佳鑫，等.MDT-LSTM-CNN模型的股价预测研究［J］.计算机工程与应用， 2022， 58（5）： 280-286.
	CAO C F， LUO Z N， XIE J X， et al. Stock price prediction based on MDT-LSTM-CNN model ［J］. Computer Engineering and Applications， 2022， 58（5）： 280-286.
24	PATEL J， SHAH S， THAKKAR P， et al. Predicting stock market index using fusion of machine learning techniques ［J］. Expert Systems with Applications， 2015， 42： 2162-2172. 10.1016/j.eswa.2014.10.031
25	ZHANG Q， YANG L， ZHOU F. Attention enhanced long short term memory network with multi-source heterogeneous information fusion： an application to BGI genomics ［J］. Information Sciences， 2021， 553： 305-330. 10.1016/j.ins.2020.10.023
26	ALMAHDI S， YANG S Y. An adaptive portfolio trading system： a risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown ［J］. Expert Systems with Applications， 2017， 87： 267-279. 10.1016/j.eswa.2017.06.023
27	ABOUSSALAH A M， LEE C-G. Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization ［J］. Expert Systems with Applications， 2020， 140： 112891. 10.1016/j.eswa.2019.112891
28	EILERS D， DUNIS C L， VON METTENHEIM H-J， et al. Intelligent trading of seasonal effects： a decision support algorithm based on reinforcement learning ［J］. Decision Support Systems， 2014， 64： 100-108. 10.1016/j.dss.2014.04.011
29	DENG Y， KONG Y， BAO F， et al. Sparse coding-inspired optimal trading system for HFT industry ［J］. IEEE Transactions on Industrial Informatics， 2015， 11（2）： 467-475. 10.1109/tii.2015.2404299
30	JEONG G， KIM H Y. Improving financial trading decisions using deep Q-learning： predicting the number of shares， action strategies， and transfer learning ［J］. Expert Systems with Applications， 2019， 117： 125-138. 10.1016/j.eswa.2018.09.036
31	LIU Y， LIU Q， ZHAO H， et al. Adaptive quantitative trading： an imitative deep reinforcement learning approach ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（2）： 2128-2135. 10.1609/aaai.v34i02.5587
32	BETANCOURT C， CHEN W-H. Deep reinforcement learning for portfolio management of markets with a dynamic number of assets ［J］. Expert Systems with Applications， 2021， 164： 114002. 10.1016/j.eswa.2020.114002
33	LIU X-Y， YANG H， GAO J， et al. FinRL： Deep reinforcement learning framework to automate trading in quantitative finance ［C］// Proceedings of the 2nd ACM International Conference on AI in Finance. New York： ACM， 2021： Article No. 1. 10.1145/3490354.3494366
34	PUTERMAN M L. Markov decision processes ［J］. Handbooks in Operations Research and Management Science， 1990， 2： 331-434. 10.1016/s0927-0507(05)80172-0
35	XU B， HU X， TANG X， et al. Ensemble reinforcement learning-based supervisory control of hybrid electric vehicle for fuel economy improvement ［J］. IEEE Transactions on Transportation Electrification， 2020， 6（2）： 717-727. 10.1109/tte.2020.2991079
36	LI Q， YU C， YAN G. A new multi-predictor ensemble decision framework based on deep reinforcement learning for regional GDP prediction ［J］. IEEE Access， 2022， 10： 45266-45279. 10.1109/access.2022.3170905
37	HU J， SHEN L， SUN G， et al. Squeeze-and-excitation networks ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
38	WANG Q L， WU B G， ZHU P F， et al. ECA-Net： Efficient channel attention for deep convolutional neural networks ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 11531-11539. 10.1109/cvpr42600.2020.01155

[1]	王昱, 任田君, 范子琳. 基于引导Minimax-DDQN的无人机空战机动决策[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2636-2643.
[2]	郭奕裕, 周箩鱼, 刘新瑜, 李尧. 改进注意力机制的电梯场景下危险品检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2295-2302.
[3]	王子腾, 于亚新, 夏子芳, 乔佳琪. 融合好奇心和策略蒸馏的稀疏奖励探索机制[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2082-2090.
[4]	方和平, 刘曙光, 冉泳屹, 钟坤华. 基于深度强化学习的多数据中心一体化调度优化[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1884-1892.
[5]	李校林, 江雨桑. 无人机辅助移动边缘计算中的任务卸载算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1893-1899.
[6]	黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.
[7]	曹腾飞, 刘延亮, 王晓英. 基于改进深度强化学习的边缘计算服务卸载算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1543-1550.
[8]	丁正凯, 傅启明, 陈建平, 陆悠, 吴宏杰, 方能炜, 邢镔. 结合注意力机制与深度强化学习的超短期光伏功率预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1647-1654.
[9]	张亚飞, 王晶, 赵耀帅, 武志昊, 林友芳. 融合市场动态层次宏观信息的股票趋势预测[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1378-1384.
[10]	王哲, 王启名, 李陶深, 葛丽娜. 基于深度强化学习的SWIPT边缘网络联合优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3540-3550.
[11]	赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.
[12]	邓晖奕, 李勇振, 尹奇跃. 引入通信与探索的多智能体强化学习QMIX算法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 202-208.
[13]	蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
[14]	郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087.
[15]	邓绍斌, 朱军, 周晓锋, 李帅, 刘舒锐. 基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1642-1648.

集成的深度强化学习投资组合模型

Integrated deep reinforcement learning portfolio model

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 38

相关文章 15

编辑推荐

Metrics