基于加权密集连接卷积网络的深度强化学习方法

doi:10.11772/j.issn.1001-9081.2018010268

计算机应用 ›› 2018, Vol. 38 ›› Issue (8): 2141-2147.DOI: 10.11772/j.issn.1001-9081.2018010268

• 人工智能 • 下一篇

基于加权密集连接卷积网络的深度强化学习方法

夏旻, 宋稳柱, 施必成, 刘佳

南京信息工程大学信息与控制学院, 南京 210044

收稿日期:2018-01-30 修回日期:2018-04-03 出版日期:2018-08-10 发布日期:2018-08-11
通讯作者: 夏旻
作者简介:夏旻(1983-),男,江苏东台人,副教授,博士,主要研究方向:机器学习、大数据分析;宋稳柱(1990-),男,江苏宿迁人,硕士研究生,主要研究方向:机器学习、大数据分析;施必成(1994-),男,江苏南通人,硕士研究生,主要研究方向:机器学习、大数据分析;刘佳(1983-),女,江苏南京人,副教授,博士,主要研究方向:机器学习、大数据分析。
基金资助:
国家自然科学基金资助项目（61503192，61773219）；江苏省自然科学基金资助项目（BK20161533）；江苏省六大人才高峰项目（2014-XXRJ-007）；江苏省青蓝工程项目。

Deep reinforcement learning method based on weighted densely connected convolutional network

XIA Min, SONG Wenzhu, SHI Bicheng, LIU Jia

School of Information and Control, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China

Received:2018-01-30 Revised:2018-04-03 Online:2018-08-10 Published:2018-08-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61503192, 61773219), the Natural Science Foundation of Jiangsu Province (BK20161533), the Six Talent Peaks Project in Jiangsu Province (2014-XXRJ-007), the Qing Lan Project of Jiangsu Province.

摘要/Abstract

摘要： 针对深度强化学习中卷积神经网络（CNN）层数过深导致的梯度消失问题，提出一种将密集连接卷积网络应用于强化学习的方法。首先，利用密集连接卷积网络中的跨层连接结构进行图像特征的有效提取；然后，在密集连接卷积网络中加入权重系数，加权密集连接卷积网络中的每一层都接收到前面几层产生的所有特征图，且之前所有层在跨层连接中被赋予不同的初始权重；最后，在训练中动态调整每层的权重，从而更加有效地提取特征。与常规深度强化学习方法相比，在GridWorld仿真实验中，在相同训练步数内的平均奖励值提升了85.67%；在FlappyBird仿真中，平均奖励值提升了55.05%。实验结果表明所提方法能在不同难度的游戏仿真实验中获得更好的性能。

关键词: 密集连接卷积网络, 深度强化学习, GridWorld, FlappyBird, 跨层连接

Abstract: To solve the problem of gradient vanishing caused by too many layers of Convolutional Neural Network (CNN) in deep reinforcement learning, a deep reinforcement learning method based on weighted densely connected convolutional network was proposed. Firstly, image features were extracted by skip-connection structure in densely connected convolutional network. Secondly, weight coefficients were added into densely connected convolutional neural network, and each layer in a weighted densely connected convolutional network received all the feature maps generated by its previous layers and was initialized the weight in the skip-connection with different value. Finally, the weight of each layer was dynamically adjusted during training to extract features more effectively. Compared with conventional deep reinforcement learning, in GridWorld simulation experiment, the average reward value of the proposed method was increased by 85.67% under the same number of training steps; in FlappyBird simulation experiment, the average reward value was increased by 55.05%. The experimental results show that the proposed method can achieve better performance in game simulation experiments with different difficulty levels.

Key words: Densely Connected Convolutional Network (DenseNet), deep reinforcement learning, GridWorld, FlappyBird, skip-connection

中图分类号:

TP183

夏旻, 宋稳柱, 施必成, 刘佳. 基于加权密集连接卷积网络的深度强化学习方法[J]. 计算机应用, 2018, 38(8): 2141-2147.

XIA Min, SONG Wenzhu, SHI Bicheng, LIU Jia. Deep reinforcement learning method based on weighted densely connected convolutional network[J]. Journal of Computer Applications, 2018, 38(8): 2141-2147.

参考文献

[1] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning:a survey[J]. Journal of Artificial Intelligence Research, 1996, 4(1):237-285.
[2] RIEDMILLER M. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method[C]//ECML 2005:Proceedings of the 2005 European Conference on Machine Learning, LNCS 3720. Berlin:Springer, 2005:317-328.
[3] LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning[C]//IJCNN 2010:Proceedings of the 2010 International Joint Conference on Neural Networks. Piscataway, NJ:IEEE, 2010:1-8.
[4] ABTAHI F, FASEL I. Deep belief nets as function approximators for reinforcement learning[J]. Frontiers in Computational Neuroscience, 2011, 5(1):112-131.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[6] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C/OL]//ICML 2014:Proceedings of the 31st International Conference on Machine Learning.[S.l.]:JMLR, 2014, 32:387-395[2017-12-02]. http://proceedings.mlr.press/v32/silver14.pdf.
[7] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J/OL]. arXiv:1509.02971v5(2016-02-29)[2017-09-09]. https://arxiv.org/abs/1509.02971.
[8] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C/OL]// ICML 2016:Proceedings of the 33rd International Conference on Machine Learning.[S.l.]:JMLR, 2016:1928-1937, arXiv:1602.01783v2(2016-06-16)[2017-12-29]. https://arxiv.org/abs/1602.01783.
[9] 李彦冬,郝宗波,雷航.卷积神经网络研究综述[J].计算机应用,2016,36(9):2508-2515. (LI Y D, HAO Z B, LEI H. Convolution neural network research review[J]. Journal of Computer Applications, 2016, 36(9):2508-2515.)
[10] HUANG G, LIU Z, van der MAATEN L, et al. Densely connected convolutional networks[C]//CVPR 2017:Proceedings of the 2017 IEEE Conference on Computer vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2017:2261-2269.
[11] SUTTON R S, BARTO A G. Reinforcement learning:an introduction[J]. IEEE Transactions on Neural Networks, 2005, 16(1):285-286.
[12] 莫建文,林士敏,张顺岚.基于TD强化学习智能博弈程序的设计与实现[J].计算机应用,2004,24(S1):287-288. (MO J W, LIN S M, ZHANG S L. Design and realization of intelligent game theory based on TD intensive learning[J]. Journal of Computer Applications, 2004, 24(S1):287-288.)
[13] 王超,郭静,包振强.改进的Q学习算法在作业车间调度中的应用[J].计算机应用,2008,28(12):3268-3270. (WANG C, GUO J, BAO Z Q. Application of improved Q learning algorithm in job shop scheduling[J]. Journal of Computer Applications, 2008, 28(12):3268-3270.)
[14] THEODOROU E, BUCHLI J, SCHAAL S. A generalized path integral control approach to reinforcement learning[J]. Journal of Machine Learning Research, 2010, 11:3137-3181.
[15] ZHANG Q, LIN M, YANG L T, et al. Energy-efficient scheduling for real-time systems based on deep Q-learning model[J]. IEEE Transactions on Sustainable Computing, 2017:1-1.
[16] WATKINS C J C H. Learning from delayed rewards[J]. Robotics & Autonomous Systems, 1989, 15(4):233-235.
[17] YAROTSKY D. Error bounds for approximations with deep ReLU networks[J]. Neural Networks, 2017, 94:103-114.
[18] DURYEA E, GANGER M, HU W. Exploring deep reinforcement learning with multi Q-learning[J]. Intelligent Control and Automation, 2016, 7(4):Article ID 72002.
[19] 李晨溪,曹雷,张永亮,等.基于知识的深度强化学习研究综述[J].系统工程与电子技术,2017,39(11):2603-2613. (LI C X, CAO L, ZHANG Y L, et al. Overview of knowledge-based research on intensive learning[J]. Systems Engineering and Electronics, 2017, 39(11):2603-2613.)
[20] SALLAB A E, ABDOU M, PEROT E, et al. Deep reinforcement learning framework for autonomous driving[J]. Electronic Imaging, 2017, 2017(19):70-76.
[21] FENG Y, ZHANG H, HAO W, et al. Joint extraction of entities and relations using reinforcement learning and deep learning[J]. Computational Intelligence and Neuroscience, 2017, 2017:7643065.
[22] ADAM S, BUSONIU L, BABUSKA R. Experience replay for real-time reinforcement learning control[J]. IEEE Transactions on Systems, Man & Cybernetics Part C, 2012, 42(2):201-212.
[23] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676):354-359.
[24] TAMPUU A, MATⅡSEN T, KODELIA D, et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PloS One, 2017, 12(4):e0172395.
[25] THRUN S B. Efficient Exploration in Reinforcement Learning[R]. Pittsburgh, PA:Carnegie Mellon University, 1992.

基于加权密集连接卷积网络的深度强化学习方法

Deep reinforcement learning method based on weighted densely connected convolutional network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

[1]	王建平, 王刚, 毛晓彬, 马恩琪. 基于深度强化学习的二连杆机械臂运动控制方法[J]. 计算机应用, 2021, 41(6): 1799-1804.
[2]	姚兴虎, 谭晓阳. 基于奖励高速路网络的多智能体强化学习中的全局信用分配算法[J]. 计算机应用, 2021, 41(1): 1-7.
[3]	傅魁, 梁少晴, 李冰. 基于改进的深度Q网络结构的商品推荐模型[J]. 计算机应用, 2020, 40(9): 2613-2621.
[4]	滕旭, 张晖, 杨春明, 赵旭剑, 李波. 基于循环一致性对抗网络的数码迷彩伪装生成方法[J]. 计算机应用, 2020, 40(2): 566-570.
[5]	王甜甜, 于双元, 徐保民. 基于策略梯度算法的工作量证明中挖矿困境研究[J]. 计算机应用, 2019, 39(5): 1336-1342.