Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 209-214.DOI: 10.11772/j.issn.1001-9081.2021020239

• Advanced computing • Previous Articles    

Q-table initialization approach for safe exploration based on factorization machine

Bosen ZENG1,2,3(), Yong ZHONG1,2, Xianhua NIU4,5   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
    3.School of Network and Communication Engineering,Chengdu Technological University,Chengdu Sichuan 611730,China
    4.National Key Laboratory of Science and Technology on Communications (University of Electronic Science and Technology of China),Chengdu Sichuan 611731,China
    5.School of Computer and Software Engineering,Xihua University,Chengdu Sichuan 610039,China
  • Received:2021-02-09 Revised:2021-04-21 Accepted:2021-04-28 Online:2021-05-14 Published:2022-01-10
  • Contact: Bosen ZENG
  • About author:ZENG Bosen, born in 1982, Ph. D. candidate, senior engineer. His research interests include machine learning, wireless communications.
    ZHONG Yong, born in 1966, Ph. D., research fellow. His research interests include big data and their intelligent processing, cloud computing, software engineering.
    NIU Xianhua, born in 1983, Ph. D., professor. Her research interests include intelligent information processing, information security.
  • Supported by:
    China Postdoctoral Science Foundation(2019M663475)

基于因子分解机用于安全探索的Q表初始化方法

曾柏森1,2,3(), 钟勇1,2, 牛宪华4,5   

  1. 1.中国科学院 成都计算机应用研究所, 成都 610041
    2.中国科学院大学, 北京 100049
    3.成都工业学院 网络与通信工程学院, 成都 611730
    4.通信抗干扰技术国家级重点实验室(电子科技大学), 成都 611731
    5.西华大学 计算机与软件工程学院, 成都 610039
  • 通讯作者: 曾柏森
  • 作者简介:曾柏森(1982—),男,四川达州人,高级工程师,博士研究生,主要研究方向:机器学习、无线通信
    钟勇(1966—),男,四川岳池人,研究员,博士,主要研究方向:大数据及其智能处理、云计算、软件工程
    牛宪华(1983—),女,河南新乡人,教授,博士,主要研究方向:智能信息处理、信息安全。
  • 基金资助:
    中国博士后科技基金资助项目(2019M663475)

Abstract:

In order to solve the problem that most exploration/exploitation strategies of reinforcement learning ignore the risk brought by the agent action selection with random components in exploration process, a Q-table initialization approach based on Factorization Machine (FM) was proposed for safe exploration. Firstly, the explored Q-values were introduced as prior knowledge, and then FM was used to build the model of potential interaction between states and actions in the prior knowledge. Finally, the unknown Q-values in Q-table were predicted based on this model to further guide the exploration of the agents. A/B testing was conducted in the grid reinforcement learning environment Cliffwalk of OpenAI Gym. The number of bad exploration episodes of Boltzmann and Upper Confidence Bound (UCB) exploration/exploitation strategies based on the proposed approach are reduced by 68.12% and 89.98% respectively. Experimental results show that the proposed approach improves the safety of exploration, and accelerates the convergence at the same time.

Key words: reinforcement learning, Q-learning, Factorization Machine (FM), Q-table initialization, safe exploration

摘要:

针对强化学习的大多数探索/利用策略在探索过程中忽略智能体随机选择动作带来的风险的问题,提出一种基于因子分解机(FM)用于安全探索的Q表初始化方法。首先,引入Q表中已探索的Q值作为先验知识;然后,利用FM建立先验知识中状态和行动间潜在的交互作用的模型;最后,基于该模型预测Q表中的未知Q值,从而进一步引导智能体探索。在OpenAI Gym的网格强化学习环境Cliffwalk中进行的A/B测试里,基于所提方法的Boltzmann和置信区间上界(UCB)探索/利用策略的不良探索幕数分别下降了68.12%和89.98%。实验结果表明,所提方法提高了传统策略的探索安全性,同时加快了收敛。

关键词: 强化学习, Q-learning, 因子分解机, Q表初始化, 安全探索

CLC Number: