《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (8): 2582-2591.DOI: 10.11772/j.issn.1001-9081.2024071046

• 数据科学与技术 • 上一篇    

基于扩散模型的增量式时间序列缺失值填充算法

冯兴杰1, 卞兴鹏1, 冯小荣2(), 王兴隆2   

  1. 1.中国民航大学 计算机科学与技术学院,天津 300300
    2.中国民航大学 空中交通管理学院,天津 300300
  • 收稿日期:2024-07-26 修回日期:2024-09-29 接受日期:2024-10-11 发布日期:2024-11-19 出版日期:2025-08-10
  • 通讯作者: 冯小荣
  • 作者简介:冯兴杰(1969—),男,河北邢台人,教授,博士,主要研究方向:数据仓库、智能信息处理
    卞兴鹏(1998—),男,江苏兴化人,硕士研究生,CCF会员,主要研究方向:深度学习、飞行数据处理
    王兴隆(1979—),男,黑龙江北安人,研究员,硕士,主要研究方向:空域运行安全、飞行数据处理。
  • 基金资助:
    国家自然科学基金重点项目(U2133207);国家自然科学基金面上项目(62173332);国防科技173计划技术领域基金资助项目(2022-JCJQ-JJ-0874);中央高校基本科研业务费专项(3122020051);中国民航大学研究生科研创新项目(2023YJSKC05002)

Incremental missing value imputation algorithm for time series based on diffusion model

Xingjie FENG1, Xingpeng BIAN1, Xiaorong FENG2(), Xinglong WANG2   

  1. 1.College of Computer Science and Technology,Civil Aviation University of China,Tianjin 300300,China
    2.College of Air Traffic Management,Civil Aviation University of China,Tianjin 300300,China
  • Received:2024-07-26 Revised:2024-09-29 Accepted:2024-10-11 Online:2024-11-19 Published:2025-08-10
  • Contact: Xiaorong FENG
  • About author:FENG Xingjie, born in 1969, Ph. D., professor. His research interests include data warehouse, intelligent information processing.
    BIAN Xingpeng, born in 1998, M. S. candidate. His research interests include deep learning, flight data processing.
    WANG Xinglong, born in 1979, M. S., research fellow. His research interests include airspace operational safety, flight data processing.
  • Supported by:
    Key Program of National Natural Science Foundation of China(U2133207);General Program of National Natural Science Foundation of China(62173332);National Defense Technology 173 Project(2022-JCJQ-JJ-0874);Fundamental Research Funds for the Central Universities(3122020051);Postgraduate Scientific Research Innovation Project of Civil Aviation University of China(2023YJSKC05002)

摘要:

时间序列中的数据缺失是一个普遍存在的问题,这会给后续分析带来困难,对缺失值的有效填充是提升数据质量以及挖掘数据价值的重要着力点。然而,现有的填充算法在特征提取方面多沿用时序预测任务的面向非缺失数据的注意力模块,而对含有缺失值的时间序列的时空特征提取效果欠佳。此外,现有的填充算法缺乏对填充规律的深入研究,这让它们对于填充过程中的阶段性填充值利用不足,导致填充的准确率有待进一步提升。为了解决上述问题,提出一种基于扩散模型的增量式时间序列缺失值填充算法(I2TDM)。I2TDM在经典扩散模型中融入时序注意力模块,以增强对于含有缺失值的时间序列的特征提取能力。同时,设计一个新颖的增量式填充算法,使用增量选择模块保留部分阶段性填充值,从而提升填充算法的稳定性与准确率。在空气质量指数(AQI)、电力变压器油温(ETT)和天气(Weather)3个公开数据集上的填充实验结果表明,I2TDM相较于CSDI、SAITS和PriSTI等基线模型在平均绝对误差(MAE)指标上至少降低了2.92%,在均方根误差(RMSE)指标上至少降低了3.49%。可见,I2TDM能够有效提升时间序列缺失值填充的准确率。

关键词: 时间序列, 缺失值填充, 扩散模型, 时序注意力, 增量式填充

Abstract:

It is a common issue in time series to encounter missing data, which complicates subsequent time series analysis. Effective missing value imputation is crucial for improving data quality and mining data value. However, attention modules designed for complete data in time series prediction tasks are often used in the existing imputation algorithms, which are insufficient for extracting spatio-temporal features from time series with missing values. Additionally, it is rare for the existing imputation algorithms to perform in-depth research on imputation patterns, as they underestimate the intermediate values generated during imputation process, so that there is still room for improvement in the accuracy of the imputation. In view of the above problems, an Incremental missing value Imputation algorithm for Time series based on Diffusion Model (I2TDM) was proposed. In I2TDM, to enhance the feature extraction capabilities for time series with missing values, a temporal attention module was incorporated into the traditional diffusion model. At the same time, to improve stability and accuracy of the imputation algorithm, a novel incremental imputation algorithm was proposed to use the incremental selection module to retain partial intermediate imputation values. Experimental results of imputation experiments on 3 datasets — Air Quality Index (AQI), Electricity Transformer Temperature (ETT) and Weather show that compared with baseline models such as CSDI, SAITS, and PriSTI, the I2TDM achieves a reduction of at least 2.92% in the Mean Absolute Error (MAE) metric and at least 3.49% in the Root Mean Square Error (RMSE) metric, which demonstrates the effectiveness of I2TDM in improving missing value imputation accuracy of time series.

Key words: time series, missing value imputation, diffusion model, temporal attention, incremental imputation

中图分类号: