Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (1): 243-248.DOI: 10.11772/j.issn.1001-9081.2020060928

Special Issue: 第八届中国数据挖掘会议(CCDM 2020)

• China Conference on Data Mining 2020 (CCDM 2020) • Previous Articles     Next Articles

Time series imputation model based on long-short term memory network with residual connection

QIAN Bin1, ZHENG Kaihong1, CHEN Zipeng2, XIAO Yong1, LI Sen2, YE Chunzhuang1, MA Qianli2   

  1. 1. Electric Power Research Institute, China Southern Power Grid International Company Limited, Guangzhou Guangdong 510663, China;
    2. School of Computer Science and Engineering, South China University of Technology, Guangzhou Guangdong 510006, China
  • Received:2020-05-30 Revised:2020-07-21 Online:2021-01-10 Published:2020-11-12
  • Supported by:
    This work is partially supported by the Key Project of National Natural Science Foundation of China (61751205), the National Natural Science Foundation of China (61872148).

基于残差连接长短期记忆网络的时间序列修复模型

钱斌1, 郑楷洪1, 陈子鹏2, 肖勇1, 李森2, 叶纯壮1, 马千里2   

  1. 1. 南方电网科学研究院有限责任公司, 广州 510663;
    2. 华南理工大学 计算机科学与工程学院, 广州 510006
  • 通讯作者: 马千里
  • 作者简介:钱斌(1989-),男,湖北十堰人,工程师,硕士,主要研究方向:电能计量;郑楷洪(1991-),男,广东汕头人,工程师,硕士,主要研究方向:电能计量、电能计量自动化系统、用电技术;陈子鹏(1996-),男,广东揭阳人,硕士研究生,主要研究方向:数据挖掘、神经网络;肖勇(1978-),男,湖南怀化人,高级工程师,博士,主要研究方向:电能计量管理、电能计量自动化系统、用电技术;李森(1994-),男,广东茂名人,硕士研究生,主要研究方向:数据挖掘、机器学习、神经网络;叶纯壮(1989-),男,海南海口人,工程师,主要研究方向:电力线损管理;马千里(1980-),男,甘肃宕昌人,教授,博士,主要研究方向:数据挖掘、机器学习、神经网络。
  • 基金资助:
    国家自然科学基金重点项目(61751205);国家自然科学基金资助项目(61872148)。

Abstract: Traditional time series imputation methods typically assume that time series data is derived from a linear dynamic system. However, the real-world time series show more non-linear characteristics. Therefore, a time series imputation model based on Long Short-Term Memory (LSTM) network with residual connection, called RSI-LSTM (ReSidual Imputation Long-Short Term Memory), was proposed to capture the non-linear dynamic characteristics of time series effectively and mine the potential relation between missing data and recent non-missing data. Specifically, the LSTM network was used to model the underlying non-linear dynamic characteristics of time series, meanwhile, the residual connection was introduced to mine the connection between the historical values and the missing value to improve the imputation capability of the model. Firstly, RSI-LSTM was applied to impute the missing data of the univariate daily power supply dataset, and then on the power load dataset of the 9th Electrical Engineering Mathematical Modeling Competition problem A, the meteorological factors were introduced as the multivariate input of RSI-LSTM to improve the imputation performance of the model on missing value in the time series. Furthermore, two general multivariate time series datasets were used to verify the missing value imputation ability of the model. Experimental results show that compared with LSTM, RSI-LSTM can obtain better imputation performance, and has the Mean Square Error (MSE) 10% lower than LSTM generally on both univariate and multivariate datasets.

Key words: missing value imputation, Long Short-Term Memory (LSTM) network, residual connection, time series, temporal dependency

摘要: 传统的时间序列缺失修复方法通常假设数据由线性动态系统产生,然而时间序列更多地表现为非线性。为此,提出了基于残差连接长短期记忆(LSTM)网络的时间序列修复模型,称为RSI-LSTM,用来有效捕获时间序列的非线性动态特性,并且挖掘缺失数据和最近的非缺失数据之间的潜在关联。具体来说,就是采用LSTM网络对时间序列的非线性动态特性进行建模,同时引入残差连接来挖掘历史值与缺失值的联系,从而提升模型的修复能力。首先使用RSI-LSTM对单变量日供电量数据集的缺失数据进行修复,然后在第九届电工数学建模竞赛A题的电力负荷数据集上,引入气象因素作为RSI-LSTM的多变量输入,以提升模型对时间序列缺失值的修复效果。此外,使用了两个通用的多变量时间序列数据集以验证模型的缺失修复能力。实验结果表明,在单变量和多变量数据集上,RSI-LSTM的缺失值修复效果均优于LSTM,得到的均方误差(MSE)总体下降了10%。

关键词: 缺失数据修复, 长短期记忆网络, 残差连接, 时间序列, 时序依赖

CLC Number: