《计算机应用》唯一官方网站

• •    下一篇

基于attention机制的深度强化学习的超短期光伏功率预测

丁正凯1,2,傅启明1,2,陈建平2,3,4,陆悠1,2,吴宏杰1,方能炜4,邢镔4   

  1. 1.苏州科技大学 电子与信息工程学院,江苏 苏州 2150092. 江苏省建筑智慧节能重点实验室(苏州科技大学),江苏 苏州 215009 3. 苏州科技大学 建筑与城市规划学院,江苏 苏州 215009;4. 重庆工业大数据创新中心有限公司,重庆市 400707

  • 收稿日期:2022-04-18 修回日期:2022-06-13 接受日期:2022-06-15 发布日期:2022-07-01 出版日期:2022-07-01
  • 通讯作者: 傅启明
  • 基金资助:
    国家重点研发项目;国家自然科学基金;国家自然科学基金;国家自然科学基金;国家自然科学基金;国家自然科学基金;江苏省重点研发计划;江苏省高校自然科学基金项目;江苏省自然科学基金

Ultra-short-term photovoltaic power prediction by deep reinforcement learning based on attention mechanism

  • Received:2022-04-18 Revised:2022-06-13 Accepted:2022-06-15 Online:2022-07-01 Published:2022-07-01

摘要: 针对传统模型受功率随机波动性影响以及易忽略重要信息从而导致预测精度低的问题,提出两种基于注意力(attention)机制的深度确定性策略梯度(DDPG)和循环确定性策略梯(RDPG)的模型——ADDPG和ARDPG,并在此基础上提出一个光伏功率预测框架。首先,将原始光伏功率数据以及气象数据标准化;然后,将光伏功率预测问题建模为马尔可夫决策过程(MDP),以历史功率数据和当前气象数据作为状态;其次,将attention机制加入DDPG和RDPG的Actor网络,赋予状态中各个分量的权重来突出重要且关键的信息,忽略无关紧要的信息和减少原始光伏功率带来的非稳定性,得到最优的动作即功率预测值;再次,通过DRL agents和历史数据交互来学习数据中的关键信息;最后,求解MDP问题得到最优的策略来做出准确的预测。所提模型在DKASC、Alice Springs光伏系统中的数据上进行实验。实验结果表明,在均方根误差(RMSE)的评估指标下,与DDPG和RDPG相比,ADDPG和ARDPG的预测误差降低了24.24%和31.34%;与加入attention机制的长短时记忆网络(LSTM)模型相比,ADDPG和ARDPG的预测误差降低了26.89%,18.76%。所提模型能够有效提高光伏功率的预测精度,也可以推广到其他预测领域如电网预测、风力发电预测等。

关键词: 深度强化学习, 光伏功率预测, 深度学习, 深度确定性策略梯度, 循环确定性策略梯度, 注意力机制

Abstract: To address the problems that traditional models were affected by random power volatility and tend to ignore important information resulting in low prediction accuracy, two models of Deep Deterministic Policy Gradient (DDPG) and Recurrent Deterministic Policy Gradient (RDPG) based on Attention mechanism, ADDPG and ARDPG, were proposed and a PV power prediction framework was proposed based on the proposed models. Firstly, the original PV power data and meteorological data were normalized. Then, the PV power prediction problem was modeled as a Markov Decision Process (MDP), where historical power data and current weather data were used as state. Secondly, attention mechanism was added to the Actor network of DDPG and RDPG. The important information in each important component was given prominence. Irrelevant information was ignored and the instability of the original PV power was reduced to make the optimal action, which was the power prediction value. Thirdly, the DRL agents interacted with the historical data was used to learn the critical information in the data. Finally, the MDP problem was solved to obtain the optimal strategy to make an accurate prediction. The proposed models are experimented on DKASC, Alice Springs PV system. The experimental results show that under the evaluation index of Root Mean Square Error(RMSE), compared with DDPG and RDPG, the prediction errors of ADDPG and ARDPG are reduced by 24.24% and 31.34%. And compared with Long Short-Term Memory (LSTM) model with attention mechanism, the prediction errors of ADDPG and ARDPG are reduced by 26.89% and 18.76%. The proposed models can effectively improve the prediction accuracy of PV power, and can be extended to other prediction fields such as grid prediction, wind power prediction, etc.

Key words: deep reinforcement learning, photovoltaic power prediction, deep learning, Deep Deterministic Policy Gradient(DDPG), Recurrent Deterministic Policy Gradient(RDPG), attention mechanism

中图分类号: