计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3242-3249.DOI: 10.11772/j.issn.1001-9081.2019050810

• 人工智能 • 上一篇    下一篇

一种高效的经验回放模块设计

陈勃, 王锦艳   

  1. 福州大学 数学与计算机科学学院, 福州 350108
  • 收稿日期:2019-05-13 修回日期:2019-08-12 出版日期:2019-11-10 发布日期:2020-12-11
  • 通讯作者: 陈勃
  • 作者简介:陈勃(1984-),男,福建福州人,副教授,博士,CCF会员,主要研究方向:人工智能、多主体仿真;王锦艳(1995-),女,福建福州人,硕士研究生,主要研究方向:机器学习、多主体仿真。
  • 基金资助:
    福建省自然科学基金资助项目(2016J01294)。

Design of experience-replay module with high performance

CHEN Bo, WANG Jinyan   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China
  • Received:2019-05-13 Revised:2019-08-12 Online:2019-11-10 Published:2020-12-11
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Fujian Province (2016J01294).

摘要: 针对深度Q网络(DQN)应用中基于python数据结构直接实现的经验回放过程时常成为性能瓶颈,提出一种具有高性能及通用性的经验回放模块设计方案。该设计方案具有两层软件结构:底层的功能内核由C++语言实现,以提供较高的执行效率;上层则由python语言编写,以面向对象的方式封装模块功能并提供调用接口,使模块具有较高易用性。针对经验回放所涉及的关键操作,一些技术细节被充分研究和精心设计,例如,将优先级回放机制作为附属组件与模块的主体运行逻辑分离,将样本的可抽取性验证提前到样本记录操作中进行,使用高效的样本淘汰策略与算法等。这些措施使模块具有较高的通用性和可扩展性。实验结果表明,按照该模块实现的经验回放过程,整体执行效率得到了充分优化,两个关键操作——样本记录与样本抽取,皆可高效执行。与基于python数据结构的直接实现方式相比,所提模块在样本抽取操作上的性能提升了约100倍,从而避免了经验回放过程成为整个系统的性能瓶颈,满足了各类DQN相关应用项目的需要。

关键词: 强化学习, 深度学习, 深度Q网络, 经验回放, 软件设计

Abstract: Concerning the problem that a straightforward implementation of the experience-replay procedure based on python data-structures may lead to a performance bottleneck in Deep Q Network (DQN) related applications, a design scheme of a universal experience-replay module was proposed to provide high performance. The proposed module consists of two software layers. One of them, called the "kernel", was written in C++, to implement fundamental functions for experience-replay, achieving a high execution efficiency. And the other layer "wrapper", written in python, encapsulated the module function and provided the call interface in an object-oriented style, guaranteeing the usability. For the critical operations in experience-replay, the software structure and algorithms were well researched and designed. The measures include implementing the priority replay mechanism as an accessorial part of the main module with logical separation, bringing forward the samples' verification of "get_batch" to the "record" operation, using efficient strategies and algorithms in eliminating samples, and so on. With such measures, the proposed module is universal and extendible. The experimental results show that the execution efficiency of the experience-replay process is well optimized by using the proposed module, and the two critical operations, the "record" and the "get_batch", can be executed efficiently. The proposed module operates the "get_batch" about 100 times faster compared with the straightforward implementation based on python data-structures. Therefore, the experience-replay process is no longer a performance bottleneck in the system, meeting the requirements of various kinds of DQN-related applications.

Key words: reinforcement learning, deep learning, Deep Q Network (DQN), experience-replay, software design

中图分类号: