《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1642-1648.DOI: 10.11772/j.issn.1001-9081.2021050716

• 前沿与综合应用 • 上一篇    

基于局部策略交互探索的深度确定性策略梯度的工业过程控制方法

邓绍斌1,2,3,4, 朱军1,2,3, 周晓锋1,2,3(), 李帅1,2,3,4, 刘舒锐1,2,3   

  1. 1.中国科学院 网络化控制系统重点实验室, 沈阳 110016
    2.中国科学院 沈阳自动化研究所, 沈阳 110169
    3.中国科学院 机器人与智能制造创新研究院, 沈阳 110169
    4.中国科学院大学, 北京 100049
  • 收稿日期:2021-05-07 修回日期:2021-09-27 接受日期:2021-11-26 发布日期:2022-03-08 出版日期:2022-05-10
  • 通讯作者: 周晓锋
  • 作者简介:邓绍斌(1997—),男,江西赣州人,硕士研究生,主要研究方向:强化学习、工业过程控制
    朱军(1964—),男,辽宁沈阳人,研究员,硕士,主要研究方向:自动控制、工业自动化
    周晓锋(1978—),女,辽宁本溪人,副研究员,博士,主要研究方向:机器学习、工业过程优化 zhouxf@sia.cn
    李帅(1988—)男,辽宁锦州人,副研究员,博士研究生,主要研究方向:机器学习、数据挖掘
    刘舒锐(1993—)男,湖北襄阳人,助理研究员,硕士,主要研究方向:工业过程建模与控制、机器学习。
  • 基金资助:
    辽宁省“兴辽英才计划”项目(XLYC1808009)

Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient

Shaobin DENG1,2,3,4, Jun ZHU1,2,3, Xiaofeng ZHOU1,2,3(), Shuai LI1,2,3,4, Shurui LIU1,2,3   

  1. 1.Key Laboratory of Networked Control System,Chinese Academy of Sciences,Shenyang Liaoning 110016,China
    2.Shenyang Institute of Automation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China
    3.Institutes for Robotics and Intelligent Manufacturing Innovation,Chinese Academy of Sciences,Shenyang Liaoning 110169,China
    4.University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2021-05-07 Revised:2021-09-27 Accepted:2021-11-26 Online:2022-03-08 Published:2022-05-10
  • Contact: Xiaofeng ZHOU
  • About author:DENG Shaobin, born in 1997,M. S. candidate. His researchinterests include reinforcement learning,industrial process control.
    ZHU Jun, born in 1964,M. S.,research follow. His researchinterests include automatic control,industrial automation.
    ZHOU Xiaofeng, born in 1978,Ph. D.,associate research fellow.Her research interests include machine learning, industrial process optimization.
    LI Shuai, born in 1988,Ph. D. candidate,associate researchfellow. His research interests include machine learning,data mining.
    LIU Shurui, born in 1993,M. S.,assistant research fellow. Hisresearch interests include industrial process modeling and control,machine learning.
  • Supported by:
    Program of Liaoning Province “Xingliao Talents Plan”(XLYC1808009)

摘要:

为了实现对非线性、滞后性和强耦合的工业过程稳定精确的控制,提出了一种基于局部策略交互探索的深度确定性策略梯度(LPIE-DDPG)的控制方法用于深度强化学习的连续控制。首先,使用深度确定性策略梯度(DDPG)算法作为控制策略,从而极大地减小控制过程中的超调和振荡现象;同时,使用原控制器的控制策略作为局部策略进行搜索,并以交互探索规则进行学习,提高了学习效率和学习稳定性;最后,在Gym框架下搭建青霉素发酵过程仿真平台并进行实验。仿真结果表明,相较于DDPG,LPIE-DDPG在收敛效率上提升了27.3%;相较于比例-积分-微分(PID),LPIE-DDPG在温度控制效果上有更少的超调和振荡现象,在产量上青霉素浓度提高了3.8%。可见所提方法能有效提升训练效率,同时提高工业过程控制的稳定性。

关键词: 工业过程控制, 深度强化学习, 深度确定性策略梯度, 局部策略交互探索, 青霉素发酵过程

Abstract:

In order to achieve the stable and precise control of industrial processes with non-linearity, hysteresis, and strong coupling, a new control method based on Local Policy Interaction Exploration-based Deep Deterministic Policy Gradient (LPIE-DDPG) was proposed for the continuous control of deep reinforcement learning. Firstly, the Deep Deterministic Policy Gradient (DDPG) algorithm was used as the control strategy to greatly reduce the phenomena of overshoot and oscillation in the control process. At the same time, the control strategy of original controller was used as the local strategy for searching, and interactive exploration was used as the rule for learning, thereby improving the learning efficiency and stability. Finally, a penicillin fermentation process simulation platform was built under the framework of Gym and the experiments were carried out. Simulation results show that, compared with DDPG, the proposed LPIE-DDPG improves the convergence efficiency by 27.3%; compared with Proportion-Integration-Differentiation (PID), the proposed LPIE-DDPG has fewer overshoot and oscillation phenomena on temperature control effect, and has the penicillin concentration increased by 3.8% in yield. In conclusion, the proposed method can effectively improve the training efficiency and improve the stability of industrial process control.

Key words: industrial process control, deep reinforcement learning, Deep Deterministic Policy Gradient (DDPG), Local Policy Interaction Exploration (LPIE), penicillin fermentation process

中图分类号: