一种高效的经验回放模块设计

doi:10.11772/j.issn.1001-9081.2019050810

计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3242-3249.DOI: 10.11772/j.issn.1001-9081.2019050810

一种高效的经验回放模块设计

陈勃, 王锦艳

福州大学数学与计算机科学学院, 福州 350108

收稿日期:2019-05-13 修回日期:2019-08-12 发布日期:2020-12-11 出版日期:2019-11-10
通讯作者: 陈勃
作者简介:陈勃(1984-),男,福建福州人,副教授,博士,CCF会员,主要研究方向:人工智能、多主体仿真;王锦艳(1995-),女,福建福州人,硕士研究生,主要研究方向:机器学习、多主体仿真。
基金资助:
福建省自然科学基金资助项目（2016J01294）。

Design of experience-replay module with high performance

CHEN Bo, WANG Jinyan

College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China

Received:2019-05-13 Revised:2019-08-12 Online:2020-12-11 Published:2019-11-10
Supported by:
This work is partially supported by the Natural Science Foundation of Fujian Province (2016J01294).

摘要/Abstract

摘要： 针对深度Q网络（DQN）应用中基于python数据结构直接实现的经验回放过程时常成为性能瓶颈，提出一种具有高性能及通用性的经验回放模块设计方案。该设计方案具有两层软件结构：底层的功能内核由C++语言实现，以提供较高的执行效率；上层则由python语言编写，以面向对象的方式封装模块功能并提供调用接口，使模块具有较高易用性。针对经验回放所涉及的关键操作，一些技术细节被充分研究和精心设计，例如，将优先级回放机制作为附属组件与模块的主体运行逻辑分离，将样本的可抽取性验证提前到样本记录操作中进行，使用高效的样本淘汰策略与算法等。这些措施使模块具有较高的通用性和可扩展性。实验结果表明，按照该模块实现的经验回放过程，整体执行效率得到了充分优化，两个关键操作——样本记录与样本抽取，皆可高效执行。与基于python数据结构的直接实现方式相比，所提模块在样本抽取操作上的性能提升了约100倍，从而避免了经验回放过程成为整个系统的性能瓶颈，满足了各类DQN相关应用项目的需要。

关键词: 强化学习, 深度学习, 深度Q网络, 经验回放, 软件设计

Abstract: Concerning the problem that a straightforward implementation of the experience-replay procedure based on python data-structures may lead to a performance bottleneck in Deep Q Network (DQN) related applications, a design scheme of a universal experience-replay module was proposed to provide high performance. The proposed module consists of two software layers. One of them, called the "kernel", was written in C++, to implement fundamental functions for experience-replay, achieving a high execution efficiency. And the other layer "wrapper", written in python, encapsulated the module function and provided the call interface in an object-oriented style, guaranteeing the usability. For the critical operations in experience-replay, the software structure and algorithms were well researched and designed. The measures include implementing the priority replay mechanism as an accessorial part of the main module with logical separation, bringing forward the samples' verification of "get_batch" to the "record" operation, using efficient strategies and algorithms in eliminating samples, and so on. With such measures, the proposed module is universal and extendible. The experimental results show that the execution efficiency of the experience-replay process is well optimized by using the proposed module, and the two critical operations, the "record" and the "get_batch", can be executed efficiently. The proposed module operates the "get_batch" about 100 times faster compared with the straightforward implementation based on python data-structures. Therefore, the experience-replay process is no longer a performance bottleneck in the system, meeting the requirements of various kinds of DQN-related applications.

Key words: reinforcement learning, deep learning, Deep Q Network (DQN), experience-replay, software design

中图分类号:

TP302

陈勃, 王锦艳. 一种高效的经验回放模块设计[J]. 计算机应用, 2019, 39(11): 3242-3249.

CHEN Bo, WANG Jinyan. Design of experience-replay module with high performance[J]. Journal of Computer Applications, 2019, 39(11): 3242-3249.

参考文献

[1] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1):9-44.
[2] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292.
[3] 沙宗轩, 薛菲, 朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2):501-508. (SHA Z X, XUE F, ZHU J. Scheduling strategy of cloud robots based on parallel reinforcement learning[J]. Journal of Computer Applications, 2019, 39(2):501-508.)
[4] SHAKEEL P M, BASKAR S, DHULIPALA V R S, et al. Maintaining security and privacy in health care system using learning based deep-Q-networks[J]. Journal of Medical Systems, 2018, 42(10):186-186.
[5] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[6] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489.
[7] 赵玉婷, 韩宝玲, 罗庆生. 基于deep Q-network双足机器人非平整地面行走稳定性控制方法[J]. 计算机应用, 2018, 38(9):2459-2463. (ZHAO Y T, HAN B L, LUO Q S. Walking stability control method based on deep Q-network for biped robot on uneven ground[J]. Journal of Computer Applications, 2018, 38(9):2459-2463.)
[8] ALANSARY A, OKTAY O, LI Y W, et al. Evaluating reinforcement learning agents for anatomical landmark detection[J]. Medical Image Analysis, 2019, 53:156-164.
[9] ZHU J, ZHU J, WANG Z, et al. Hierarchical decision and control for continuous multitarget problem:policy evaluation with action delay[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2):464-473.
[10] LIN L J. Self-improving reactive Agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992, 8(3/4):293-321.
[11] WULFING J, KUMAR S S, BOEDECKER J, et al. Adaptive long-term control of biological neural networks with deep reinforcement learning[J]. Neurocomputing, 2019, 342:66-74.
[12] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9:1735-1780.
[13] KIM J J, CHA S H, CHO K H, et al. Deep reinforcement learning based multi-Agent collaborated network for distributed stock trading[J]. International Journal of Grid and Distributed Computing, 2018, 11(2):11-20.
[14] 朱斐, 吴文, 刘全, 等. 一种最大置信上界经验采样的深度Q网络方法[J]. 计算机研究与发展, 2018, 55(8):1694-1705.(ZHU F, WU W, LIU Q, et al. A deep Q-network method based on upper confidence bound experience sampling[J]. Journal of Computer Research and Development, 2018, 55(8):1694-1705.)
[15] BRUIN T D, KOBER J, TUYLS K, et al. Experience selection in deep reinforcement learning for control[J]. Journal of Machine Learning Research, 2018, 19:1-56.
[16] YOU S X, DIAO M, GAO L P. Deep reinforcement learning for target searching in cognitive electronic warfare[J]. IEEE Access, 2019, 7:37432-37447.
[17] LEI X Y, ZHANG Z A, DONG P F. Dynamic path planning of unknown environment based on deep reinforcement learning[J]. Journal of Robotics, 2018, 2018:Article ID 5781591.

一种高效的经验回放模块设计

Design of experience-replay module with high performance

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[5]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[6]	肖海林, 黄天义, 代秋香, 张跃军, 张中山. 基于轨迹预测的安全强化学习自动变道决策方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2958-2963.
[7]	何浩东, 符浩, 王强, 周帅, 刘伟. 基于深度强化学习的多机器人路径跟随与编队[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2626-2633.
[8]	周毅, 高华, 田永谌. 基于裁剪优化和策略指导的近端策略优化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2334-2341.
[9]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[10]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[11]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[12]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
[15]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.