《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (8): 2582-2591.DOI: 10.11772/j.issn.1001-9081.2024071046
• 数据科学与技术 • 上一篇
收稿日期:
2024-07-26
修回日期:
2024-09-29
接受日期:
2024-10-11
发布日期:
2024-11-19
出版日期:
2025-08-10
通讯作者:
冯小荣
作者简介:
冯兴杰(1969—),男,河北邢台人,教授,博士,主要研究方向:数据仓库、智能信息处理基金资助:
Xingjie FENG1, Xingpeng BIAN1, Xiaorong FENG2(), Xinglong WANG2
Received:
2024-07-26
Revised:
2024-09-29
Accepted:
2024-10-11
Online:
2024-11-19
Published:
2025-08-10
Contact:
Xiaorong FENG
About author:
FENG Xingjie, born in 1969, Ph. D., professor. His research interests include data warehouse, intelligent information processing.Supported by:
摘要:
时间序列中的数据缺失是一个普遍存在的问题,这会给后续分析带来困难,对缺失值的有效填充是提升数据质量以及挖掘数据价值的重要着力点。然而,现有的填充算法在特征提取方面多沿用时序预测任务的面向非缺失数据的注意力模块,而对含有缺失值的时间序列的时空特征提取效果欠佳。此外,现有的填充算法缺乏对填充规律的深入研究,这让它们对于填充过程中的阶段性填充值利用不足,导致填充的准确率有待进一步提升。为了解决上述问题,提出一种基于扩散模型的增量式时间序列缺失值填充算法(I2TDM)。I2TDM在经典扩散模型中融入时序注意力模块,以增强对于含有缺失值的时间序列的特征提取能力。同时,设计一个新颖的增量式填充算法,使用增量选择模块保留部分阶段性填充值,从而提升填充算法的稳定性与准确率。在空气质量指数(AQI)、电力变压器油温(ETT)和天气(Weather)3个公开数据集上的填充实验结果表明,I2TDM相较于CSDI、SAITS和PriSTI等基线模型在平均绝对误差(MAE)指标上至少降低了2.92%,在均方根误差(RMSE)指标上至少降低了3.49%。可见,I2TDM能够有效提升时间序列缺失值填充的准确率。
中图分类号:
冯兴杰, 卞兴鹏, 冯小荣, 王兴隆. 基于扩散模型的增量式时间序列缺失值填充算法[J]. 计算机应用, 2025, 45(8): 2582-2591.
Xingjie FENG, Xingpeng BIAN, Xiaorong FENG, Xinglong WANG. Incremental missing value imputation algorithm for time series based on diffusion model[J]. Journal of Computer Applications, 2025, 45(8): 2582-2591.
算法 | 算法运行时刻的真实值状态 | 算法运行时的模型生成值状态 |
---|---|---|
预测算法 | 真实值不存在 | 预测值不可评价 |
填充算法 | 真实值已存在,不可观测 | 填充值可间接评价 |
表1 填充算法与预测算法的区别
Tab. 1 Difference between imputation algorithms and prediction algorithms
算法 | 算法运行时刻的真实值状态 | 算法运行时的模型生成值状态 |
---|---|---|
预测算法 | 真实值不存在 | 预测值不可评价 |
填充算法 | 真实值已存在,不可观测 | 填充值可间接评价 |
数据集 | 采样点数 | 维度 | 原始缺失率/% |
---|---|---|---|
AQI | 8 760 | 36 | 13.3 |
ETT | 17 421 | 6 | 0.0 |
Weather | 52 697 | 21 | 0.0 |
表2 数据集统计结果
Tab. 2 Statistical results of datasets
数据集 | 采样点数 | 维度 | 原始缺失率/% |
---|---|---|---|
AQI | 8 760 | 36 | 13.3 |
ETT | 17 421 | 6 | 0.0 |
Weather | 52 697 | 21 | 0.0 |
数据集 | batch_size | epoch | loss | learning_rate | diff_steps | res_channels | n_samples |
---|---|---|---|---|---|---|---|
AQI | 16 | 100 | huber | 0.001 0 | 50 | 64 | 500 |
ETT | 16 | 100 | huber | 0.000 5 | 50 | 64 | 300 |
Weather | 16 | 100 | huber | 0.000 5 | 50 | 64 | 500 |
表3 I2TDM超参数设置
Tab. 3 I2TDM hyperparameter setting
数据集 | batch_size | epoch | loss | learning_rate | diff_steps | res_channels | n_samples |
---|---|---|---|---|---|---|---|
AQI | 16 | 100 | huber | 0.001 0 | 50 | 64 | 500 |
ETT | 16 | 100 | huber | 0.000 5 | 50 | 64 | 300 |
Weather | 16 | 100 | huber | 0.000 5 | 50 | 64 | 500 |
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 65.83 | 12.47 | 26.99 | 10.29 | 7.13 | 11.96 | 7.90 | 6.83 |
RMSE | 93.47 | 21.29 | 57.17 | 18.81 | 12.71 | 20.22 | 14.95 | 12.12 | |
20 | MAE | 66.12 | 13.52 | 26.90 | 10.60 | 7.39 | 12.70 | 8.74 | 7.12 |
RMSE | 92.07 | 22.63 | 57.07 | 19.31 | 13.22 | 21.51 | 16.90 | 12.53 | |
50 | MAE | 67.13 | 18.86 | 27.50 | 12.34 | 8.64 | 14.73 | 12.04 | 8.40 |
RMSE | 106.02 | 31.69 | 57.32 | 22.37 | 15.68 | 25.35 | 23.95 | 15.01 | |
90 | MAE | 80.22 | 41.75 | 32.83 | 17.06 | 14.27 | 28.47 | 33.27 | 14.12 |
RMSE | 124.21 | 62.27 | 60.91 | 30.33 | 24.70 | 46.34 | 59.44 | 24.74 |
表4 在AQI数据集上的缺失值填充实验结果
Tab. 4 Experimental results of missing value imputation on AQI dataset
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 65.83 | 12.47 | 26.99 | 10.29 | 7.13 | 11.96 | 7.90 | 6.83 |
RMSE | 93.47 | 21.29 | 57.17 | 18.81 | 12.71 | 20.22 | 14.95 | 12.12 | |
20 | MAE | 66.12 | 13.52 | 26.90 | 10.60 | 7.39 | 12.70 | 8.74 | 7.12 |
RMSE | 92.07 | 22.63 | 57.07 | 19.31 | 13.22 | 21.51 | 16.90 | 12.53 | |
50 | MAE | 67.13 | 18.86 | 27.50 | 12.34 | 8.64 | 14.73 | 12.04 | 8.40 |
RMSE | 106.02 | 31.69 | 57.32 | 22.37 | 15.68 | 25.35 | 23.95 | 15.01 | |
90 | MAE | 80.22 | 41.75 | 32.83 | 17.06 | 14.27 | 28.47 | 33.27 | 14.12 |
RMSE | 124.21 | 62.27 | 60.91 | 30.33 | 24.70 | 46.34 | 59.44 | 24.74 |
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 2.53 | 0.47 | 1.09 | 0.35 | 0.25 | 0.51 | 0.47 | 0.23 |
RMSE | 4.63 | 1.05 | 2.80 | 0.97 | 0.48 | 1.17 | 0.91 | 0.43 | |
20 | MAE | 2.57 | 0.54 | 1.13 | 0.38 | 0.29 | 0.54 | 0.52 | 0.27 |
RMSE | 4.52 | 1.18 | 2.83 | 1.00 | 0.61 | 1.17 | 1.02 | 0.52 | |
50 | MAE | 2.94 | 0.83 | 1.45 | 0.52 | 0.44 | 0.81 | 0.63 | 0.41 |
RMSE | 4.71 | 1.66 | 3.22 | 1.18 | 1.00 | 1.89 | 1.26 | 0.90 | |
90 | MAE | 3.72 | 2.29 | 3.21 | 1.17 | 1.07 | 1.83 | 1.51 | 1.14 |
RMSE | 5.43 | 4.21 | 5.74 | 2.49 | 2.33 | 3.51 | 3.26 | 2.38 |
表5 在ETT-h1数据集上的缺失值填充实验结果
Tab. 5 Experimental results of missing value imputation on ETT-h1 dataset
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 2.53 | 0.47 | 1.09 | 0.35 | 0.25 | 0.51 | 0.47 | 0.23 |
RMSE | 4.63 | 1.05 | 2.80 | 0.97 | 0.48 | 1.17 | 0.91 | 0.43 | |
20 | MAE | 2.57 | 0.54 | 1.13 | 0.38 | 0.29 | 0.54 | 0.52 | 0.27 |
RMSE | 4.52 | 1.18 | 2.83 | 1.00 | 0.61 | 1.17 | 1.02 | 0.52 | |
50 | MAE | 2.94 | 0.83 | 1.45 | 0.52 | 0.44 | 0.81 | 0.63 | 0.41 |
RMSE | 4.71 | 1.66 | 3.22 | 1.18 | 1.00 | 1.89 | 1.26 | 0.90 | |
90 | MAE | 3.72 | 2.29 | 3.21 | 1.17 | 1.07 | 1.83 | 1.51 | 1.14 |
RMSE | 5.43 | 4.21 | 5.74 | 2.49 | 2.33 | 3.51 | 3.26 | 2.38 |
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 66.23 | 6.61 | 20.37 | 3.96 | 3.02 | 6.67 | 5.68 | 2.87 |
RMSE | 188.88 | 35.81 | 100.09 | 28.97 | 21.21 | 39.36 | 37.74 | 19.40 | |
20 | MAE | 77.34 | 9.12 | 19.79 | 4.31 | 3.39 | 7.96 | 5.77 | 3.27 |
RMSE | 185.29 | 43.75 | 97.88 | 29.65 | 26.48 | 53.90 | 37.34 | 24.26 | |
50 | MAE | 114.93 | 24.31 | 20.98 | 5.88 | 4.41 | 11.35 | 6.62 | 4.28 |
RMSE | 256.64 | 79.73 | 101.36 | 38.76 | 32.37 | 65.57 | 42.20 | 31.08 | |
90 | MAE | 165.42 | 67.90 | 50.01 | 11.33 | 9.07 | 29.38 | 15.71 | 8.71 |
RMSE | 375.36 | 191.95 | 156.47 | 58.56 | 50.57 | 117.72 | 82.01 | 47.58 |
表6 在Weather数据集上的缺失值填充实验结果
Tab. 6 Experimental results of missing value imputation on Weather dataset
缺失值填充比例/% | 指标 | Median | BRITS | GAIN | SAITS | CSDI | SSSD | PriSTI | I2TDM |
---|---|---|---|---|---|---|---|---|---|
10 | MAE | 66.23 | 6.61 | 20.37 | 3.96 | 3.02 | 6.67 | 5.68 | 2.87 |
RMSE | 188.88 | 35.81 | 100.09 | 28.97 | 21.21 | 39.36 | 37.74 | 19.40 | |
20 | MAE | 77.34 | 9.12 | 19.79 | 4.31 | 3.39 | 7.96 | 5.77 | 3.27 |
RMSE | 185.29 | 43.75 | 97.88 | 29.65 | 26.48 | 53.90 | 37.34 | 24.26 | |
50 | MAE | 114.93 | 24.31 | 20.98 | 5.88 | 4.41 | 11.35 | 6.62 | 4.28 |
RMSE | 256.64 | 79.73 | 101.36 | 38.76 | 32.37 | 65.57 | 42.20 | 31.08 | |
90 | MAE | 165.42 | 67.90 | 50.01 | 11.33 | 9.07 | 29.38 | 15.71 | 8.71 |
RMSE | 375.36 | 191.95 | 156.47 | 58.56 | 50.57 | 117.72 | 82.01 | 47.58 |
缺失率/% | 指标 | No TAM | No ISM | I2TDM |
---|---|---|---|---|
10 | MAE | 8.38 | 6.84 | 6.83 |
RMSE | 16.17 | 12.23 | 12.12 | |
50 | MAE | 12.85 | 8.45 | 8.40 |
RMSE | 25.28 | 15.15 | 15.01 | |
90 | MAE | 38.22 | 14.22 | 14.12 |
RMSE | 61.43 | 24.80 | 24.74 |
表7 消融实验结果
Tab. 7 Ablation experiment results
缺失率/% | 指标 | No TAM | No ISM | I2TDM |
---|---|---|---|---|
10 | MAE | 8.38 | 6.84 | 6.83 |
RMSE | 16.17 | 12.23 | 12.12 | |
50 | MAE | 12.85 | 8.45 | 8.40 |
RMSE | 25.28 | 15.15 | 15.01 | |
90 | MAE | 38.22 | 14.22 | 14.12 |
RMSE | 61.43 | 24.80 | 24.74 |
[1] | DU W, CÔTÉ D, BARBER C, et al. Forecasting loss of signal in optical networks with machine learning[J]. Journal of Optical Communications and Networking, 2021, 13(10): E109-E121. |
[2] | SILVA I, MOODY G, SCOTT D J, et al. Predicting in-hospital mortality of ICU patients: the PhysioNet/Computing in cardiology challenge 2012[C]// Proceedings of the 2012 Computing in Cardiology. Piscataway: IEEE, 2012: 245-248. |
[3] | YI X, ZHENG Y, ZHANG J, et al. ST-MVL: filling missing values in geo-sensory time series data[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2704-2710. |
[4] | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 6840-6851. |
[5] | LUGMAYR A, DANELLJAN M, ROMERO A, et al. RePaint: inpainting using denoising diffusion probabilistic models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11451-11461. |
[6] | XIA B, ZHANG Y, WANG S, et al. DiffIR: efficient diffusion model for image restoration[C]// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 13049-13059. |
[7] | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 10674-10685. |
[8] | 刘泽润,尹宇飞,薛文灏,等. 基于扩散模型的条件引导图像生成综述[J]. 浙江大学学报(理学版), 2023, 50(6):651-667. |
LIU Z R, YIN Y F, XUE W H, et al. A review of conditional image generation based on diffusion models[J]. Journal of Zhejiang University (Science Edition), 2023, 50(6): 651-667. | |
[9] | KONG Z, PING W, HUANG J, et al. DiffWave: a versatile diffusion model for audio synthesis[EB/OL]. [2024-06-11].. |
[10] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
[11] | WHITE I R, ROYSTON P, WOOD A M. Multiple imputation using chained equations: issues and guidance for practice[J]. Statistics in Medicine, 2011, 30(4): 377-399. |
[12] | BATISTA G E A P A, MONARD M C. A study of k-nearest neighbour as an imputation method[C]// Proceedings of the 2nd International Conference on Hybrid Intelligent Systems: Soft Computing Systems — Design, Management and Applications. Amsterdam: IOS Press, 2002: 251-260. |
[13] | STEKHOVEN D J, BÜHLMANN P. MissForest — non-parametric missing value imputation for mixed-type data[J]. Bioinformatics, 2012, 28(1): 112-118. |
[14] | CHE Z, PURUSHOTHAM S, CHO K, et al. Recurrent neural networks for multivariate time series with missing values[J]. Scientific Reports, 2018, 8: No.6085. |
[15] | CAO W, WANG D, LI J, et al. BRITS: bidirectional recurrent imputation for time series[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 6776-6786. |
[16] | DU W, CÔTÉ D, LIU Y. SAITS: self-attention-based imputation for time series[J]. Expert Systems with Applications, 2023, 219: No.119619. |
[17] | YOON J, JORDON J, VAN DER SCHAAR M. GAIN: missing data imputation using generative adversarial nets[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 5689-5698. |
[18] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. |
[19] | OH E, KIM T, JI Y, et al. STING: self-attention based time-series imputation networks using GAN[C]// Proceedings of the 2021 IEEE International Conference on Data Mining. Piscataway: IEEE, 2021: 1264-1269. |
[20] | TASHIRO Y, SONG J, SONG Y, et al. CSDI: conditional score-based diffusion models for probabilistic time series imputation[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 24804-24816. |
[21] | ALCARAZ J L, STRODTHOFF N. Diffusion-based time series imputation and forecasting with structured state space models[EB/OL]. [2024-06-28].. |
[22] | LIU M, HUANG H, FENG H, et al. PriSTI: a conditional diffusion framework for spatiotemporal imputation[C]// Proceedings of the IEEE 39th International Conference on Data Engineering. Piscataway: IEEE, 2023: 1927-1939. |
[23] | DAI Z, GETZEN E, LONG Q. SADI: similarity-aware diffusion model-based imputation for incomplete temporal EHR data[C]// Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. New York: JMLR.org, 2024: 4195-4203. |
[24] | TAN C, GAO Z, WU L, et al. Temporal attention unit: towards efficient spatiotemporal predictive learning[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 18770-18782. |
[25] | ZHANG S, GUO B, DONG A, et al. Cautionary tales on air-quality improvement in Beijing[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2017, 473(2205): No.20170457. |
[26] | WU H, XU J, WANG J, et al. Autoformer: decomposition transformers with auto-correlation for long-term series forecasting[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 22419-22430. |
[1] | 王慧斌, 胡展傲, 胡节, 徐袁伟, 文博. 基于分段注意力机制的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2262-2268. |
[2] | 李岚皓, 严皓钧, 周号益, 孙庆赟, 李建欣. 基于神经网络的多尺度信息融合时间序列长期预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1776-1783. |
[3] | 闫龙博, 毛文涛, 仲志鸿, 范黎林. 面向城市排水管网缺陷诊断的鲁棒无监督多任务异常检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1833-1840. |
[4] | 杨光局, 罗天健, 王开军, 杨思琪. 多分支多视图的时间序列上下文对比表征学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1042-1052. |
[5] | 李强, 白少雄, 熊源, 袁薇. 基于视觉大模型隐私保护的监控图像定位[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 832-839. |
[6] | 王猛, 张大千, 周冰艳, 马倩影, 吕继东. 基于时序知识图谱补全的CTCS-3级列控车载接口设备故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 677-684. |
[7] | 张倩婷, 胡丽莹, 陈黎飞. 时间序列的鲁棒形态表征方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 436-443. |
[8] | 张翰林, 王俊陆, 宋宝燕. 融合衍生特征的时间序列事件分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 428-435. |
[9] | 胡健鹏, 张立臣. 面向多时间步风功率预测的深度时空网络模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 98-105. |
[10] | 张思齐, 张金俊, 王天一, 秦小林. 基于信号时态逻辑的深度时序事件检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 90-97. |
[11] | 范黎林, 曹富康, 王琬婷, 杨凯, 宋钊瑜. 基于需求模式自适应匹配的间歇性需求预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2747-2755. |
[12] | 任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746. |
[13] | 赵秦壮, 谭红叶. 基于自适应阈值学习的时序因果推断方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2660-2666. |
[14] | 李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420. |
[15] | 徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||