计算机应用 ›› 2018, Vol. 38 ›› Issue (8): 2141-2147.DOI: 10.11772/j.issn.1001-9081.2018010268

• 人工智能 •    下一篇

基于加权密集连接卷积网络的深度强化学习方法

夏旻, 宋稳柱, 施必成, 刘佳   

  1. 南京信息工程大学 信息与控制学院, 南京 210044
  • 收稿日期:2018-01-30 修回日期:2018-04-03 出版日期:2018-08-10 发布日期:2018-08-11
  • 通讯作者: 夏旻
  • 作者简介:夏旻(1983-),男,江苏东台人,副教授,博士,主要研究方向:机器学习、大数据分析;宋稳柱(1990-),男,江苏宿迁人,硕士研究生,主要研究方向:机器学习、大数据分析;施必成(1994-),男,江苏南通人,硕士研究生,主要研究方向:机器学习、大数据分析;刘佳(1983-),女,江苏南京人,副教授,博士,主要研究方向:机器学习、大数据分析。
  • 基金资助:
    国家自然科学基金资助项目(61503192,61773219);江苏省自然科学基金资助项目(BK20161533);江苏省六大人才高峰项目(2014-XXRJ-007);江苏省青蓝工程项目。

Deep reinforcement learning method based on weighted densely connected convolutional network

XIA Min, SONG Wenzhu, SHI Bicheng, LIU Jia   

  1. School of Information and Control, Nanjing University of Information Science & Technology, Nanjing Jiangsu 210044, China
  • Received:2018-01-30 Revised:2018-04-03 Online:2018-08-10 Published:2018-08-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503192, 61773219), the Natural Science Foundation of Jiangsu Province (BK20161533), the Six Talent Peaks Project in Jiangsu Province (2014-XXRJ-007), the Qing Lan Project of Jiangsu Province.

摘要: 针对深度强化学习中卷积神经网络(CNN)层数过深导致的梯度消失问题,提出一种将密集连接卷积网络应用于强化学习的方法。首先,利用密集连接卷积网络中的跨层连接结构进行图像特征的有效提取;然后,在密集连接卷积网络中加入权重系数,加权密集连接卷积网络中的每一层都接收到前面几层产生的所有特征图,且之前所有层在跨层连接中被赋予不同的初始权重;最后,在训练中动态调整每层的权重,从而更加有效地提取特征。与常规深度强化学习方法相比,在GridWorld仿真实验中,在相同训练步数内的平均奖励值提升了85.67%;在FlappyBird仿真中,平均奖励值提升了55.05%。实验结果表明所提方法能在不同难度的游戏仿真实验中获得更好的性能。

关键词: 密集连接卷积网络, 深度强化学习, GridWorld, FlappyBird, 跨层连接

Abstract: To solve the problem of gradient vanishing caused by too many layers of Convolutional Neural Network (CNN) in deep reinforcement learning, a deep reinforcement learning method based on weighted densely connected convolutional network was proposed. Firstly, image features were extracted by skip-connection structure in densely connected convolutional network. Secondly, weight coefficients were added into densely connected convolutional neural network, and each layer in a weighted densely connected convolutional network received all the feature maps generated by its previous layers and was initialized the weight in the skip-connection with different value. Finally, the weight of each layer was dynamically adjusted during training to extract features more effectively. Compared with conventional deep reinforcement learning, in GridWorld simulation experiment, the average reward value of the proposed method was increased by 85.67% under the same number of training steps; in FlappyBird simulation experiment, the average reward value was increased by 55.05%. The experimental results show that the proposed method can achieve better performance in game simulation experiments with different difficulty levels.

Key words: Densely Connected Convolutional Network (DenseNet), deep reinforcement learning, GridWorld, FlappyBird, skip-connection

中图分类号: