基于扩散模型的增量式时间序列缺失值填充算法

doi:10.11772/j.issn.1001-9081.2024071046

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (8): 2582-2591.DOI: 10.11772/j.issn.1001-9081.2024071046

• 数据科学与技术 • 上一篇

基于扩散模型的增量式时间序列缺失值填充算法

冯兴杰¹, 卞兴鹏¹, 冯小荣²(), 王兴隆²

^1.中国民航大学计算机科学与技术学院，天津 300300
^2.中国民航大学空中交通管理学院，天津 300300

收稿日期:2024-07-26 修回日期:2024-09-29 接受日期:2024-10-11 发布日期:2024-11-19 出版日期:2025-08-10
通讯作者: 冯小荣
作者简介:冯兴杰（1969—），男，河北邢台人，教授，博士，主要研究方向：数据仓库、智能信息处理
卞兴鹏（1998—），男，江苏兴化人，硕士研究生，CCF会员，主要研究方向：深度学习、飞行数据处理
王兴隆（1979—），男，黑龙江北安人，研究员，硕士，主要研究方向：空域运行安全、飞行数据处理。
基金资助:
国家自然科学基金重点项目(U2133207);国家自然科学基金面上项目(62173332);国防科技173计划技术领域基金资助项目(2022-JCJQ-JJ-0874);中央高校基本科研业务费专项(3122020051);中国民航大学研究生科研创新项目(2023YJSKC05002)

Incremental missing value imputation algorithm for time series based on diffusion model

Xingjie FENG¹, Xingpeng BIAN¹, Xiaorong FENG²(), Xinglong WANG²

^1.College of Computer Science and Technology，Civil Aviation University of China，Tianjin 300300，China
^2.College of Air Traffic Management，Civil Aviation University of China，Tianjin 300300，China

Received:2024-07-26 Revised:2024-09-29 Accepted:2024-10-11 Online:2024-11-19 Published:2025-08-10
Contact: Xiaorong FENG
About author:FENG Xingjie， born in 1969， Ph. D.， professor. His research interests include data warehouse， intelligent information processing.
BIAN Xingpeng， born in 1998， M. S. candidate. His research interests include deep learning， flight data processing.
WANG Xinglong， born in 1979， M. S.， research fellow. His research interests include airspace operational safety， flight data processing.
Supported by:
Key Program of National Natural Science Foundation of China(U2133207);General Program of National Natural Science Foundation of China(62173332);National Defense Technology 173 Project(2022-JCJQ-JJ-0874);Fundamental Research Funds for the Central Universities(3122020051);Postgraduate Scientific Research Innovation Project of Civil Aviation University of China(2023YJSKC05002)

摘要/Abstract

摘要：

时间序列中的数据缺失是一个普遍存在的问题，这会给后续分析带来困难，对缺失值的有效填充是提升数据质量以及挖掘数据价值的重要着力点。然而，现有的填充算法在特征提取方面多沿用时序预测任务的面向非缺失数据的注意力模块，而对含有缺失值的时间序列的时空特征提取效果欠佳。此外，现有的填充算法缺乏对填充规律的深入研究，这让它们对于填充过程中的阶段性填充值利用不足，导致填充的准确率有待进一步提升。为了解决上述问题，提出一种基于扩散模型的增量式时间序列缺失值填充算法（I2TDM）。I2TDM在经典扩散模型中融入时序注意力模块，以增强对于含有缺失值的时间序列的特征提取能力。同时，设计一个新颖的增量式填充算法，使用增量选择模块保留部分阶段性填充值，从而提升填充算法的稳定性与准确率。在空气质量指数（AQI）、电力变压器油温（ETT）和天气（Weather）3个公开数据集上的填充实验结果表明，I2TDM相较于CSDI、SAITS和PriSTI等基线模型在平均绝对误差（MAE）指标上至少降低了2.92%，在均方根误差（RMSE）指标上至少降低了3.49%。可见，I2TDM能够有效提升时间序列缺失值填充的准确率。

关键词: 时间序列, 缺失值填充, 扩散模型, 时序注意力, 增量式填充

Abstract:

It is a common issue in time series to encounter missing data， which complicates subsequent time series analysis. Effective missing value imputation is crucial for improving data quality and mining data value. However， attention modules designed for complete data in time series prediction tasks are often used in the existing imputation algorithms， which are insufficient for extracting spatio-temporal features from time series with missing values. Additionally， it is rare for the existing imputation algorithms to perform in-depth research on imputation patterns， as they underestimate the intermediate values generated during imputation process， so that there is still room for improvement in the accuracy of the imputation. In view of the above problems， an Incremental missing value Imputation algorithm for Time series based on Diffusion Model （I2TDM） was proposed. In I2TDM， to enhance the feature extraction capabilities for time series with missing values， a temporal attention module was incorporated into the traditional diffusion model. At the same time， to improve stability and accuracy of the imputation algorithm， a novel incremental imputation algorithm was proposed to use the incremental selection module to retain partial intermediate imputation values. Experimental results of imputation experiments on 3 datasets — Air Quality Index （AQI）， Electricity Transformer Temperature （ETT） and Weather show that compared with baseline models such as CSDI， SAITS， and PriSTI， the I2TDM achieves a reduction of at least 2.92% in the Mean Absolute Error （MAE） metric and at least 3.49% in the Root Mean Square Error （RMSE） metric， which demonstrates the effectiveness of I2TDM in improving missing value imputation accuracy of time series.

Key words: time series, missing value imputation, diffusion model, temporal attention, incremental imputation

中图分类号:

TP391.4

冯兴杰, 卞兴鹏, 冯小荣, 王兴隆. 基于扩散模型的增量式时间序列缺失值填充算法[J]. 计算机应用, 2025, 45(8): 2582-2591.

Xingjie FENG, Xingpeng BIAN, Xiaorong FENG, Xinglong WANG. Incremental missing value imputation algorithm for time series based on diffusion model[J]. Journal of Computer Applications, 2025, 45(8): 2582-2591.

图/表 14

参考文献 26

[1]	DU W， CÔTÉ D， BARBER C， et al. Forecasting loss of signal in optical networks with machine learning［J］. Journal of Optical Communications and Networking， 2021， 13（10）： E109-E121.
[2]	SILVA I， MOODY G， SCOTT D J， et al. Predicting in-hospital mortality of ICU patients： the PhysioNet/Computing in cardiology challenge 2012［C］// Proceedings of the 2012 Computing in Cardiology. Piscataway： IEEE， 2012： 245-248.
[3]	YI X， ZHENG Y， ZHANG J， et al. ST-MVL： filling missing values in geo-sensory time series data［C］// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016： 2704-2710.
[4]	HO J， JAIN A， ABBEEL P. Denoising diffusion probabilistic models［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 6840-6851.
[5]	LUGMAYR A， DANELLJAN M， ROMERO A， et al. RePaint： inpainting using denoising diffusion probabilistic models［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11451-11461.
[6]	XIA B， ZHANG Y， WANG S， et al. DiffIR： efficient diffusion model for image restoration［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 13049-13059.
[7]	ROMBACH R， BLATTMANN A， LORENZ D， et al. High-resolution image synthesis with latent diffusion models［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10674-10685.
[8]	刘泽润，尹宇飞，薛文灏，等. 基于扩散模型的条件引导图像生成综述［J］. 浙江大学学报（理学版）， 2023， 50（6）：651-667.
	LIU Z R， YIN Y F， XUE W H， et al. A review of conditional image generation based on diffusion models［J］. Journal of Zhejiang University （Science Edition）， 2023， 50（6）： 651-667.
[9]	KONG Z， PING W， HUANG J， et al. DiffWave： a versatile diffusion model for audio synthesis［EB/OL］. ［2024-06-11］..
[10]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[11]	WHITE I R， ROYSTON P， WOOD A M. Multiple imputation using chained equations： issues and guidance for practice［J］. Statistics in Medicine， 2011， 30（4）： 377-399.
[12]	BATISTA G E A P A， MONARD M C. A study of k-nearest neighbour as an imputation method［C］// Proceedings of the 2nd International Conference on Hybrid Intelligent Systems： Soft Computing Systems — Design， Management and Applications. Amsterdam： IOS Press， 2002： 251-260.
[13]	STEKHOVEN D J， BÜHLMANN P. MissForest — non-parametric missing value imputation for mixed-type data［J］. Bioinformatics， 2012， 28（1）： 112-118.
[14]	CHE Z， PURUSHOTHAM S， CHO K， et al. Recurrent neural networks for multivariate time series with missing values［J］. Scientific Reports， 2018， 8： No.6085.
[15]	CAO W， WANG D， LI J， et al. BRITS： bidirectional recurrent imputation for time series［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 6776-6786.
[16]	DU W， CÔTÉ D， LIU Y. SAITS： self-attention-based imputation for time series［J］. Expert Systems with Applications， 2023， 219： No.119619.
[17]	YOON J， JORDON J， VAN DER SCHAAR M. GAIN： missing data imputation using generative adversarial nets［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 5689-5698.
[18]	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144.
[19]	OH E， KIM T， JI Y， et al. STING： self-attention based time-series imputation networks using GAN［C］// Proceedings of the 2021 IEEE International Conference on Data Mining. Piscataway： IEEE， 2021： 1264-1269.
[20]	TASHIRO Y， SONG J， SONG Y， et al. CSDI： conditional score-based diffusion models for probabilistic time series imputation［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 24804-24816.
[21]	ALCARAZ J L， STRODTHOFF N. Diffusion-based time series imputation and forecasting with structured state space models［EB/OL］. ［2024-06-28］..
[22]	LIU M， HUANG H， FENG H， et al. PriSTI： a conditional diffusion framework for spatiotemporal imputation［C］// Proceedings of the IEEE 39th International Conference on Data Engineering. Piscataway： IEEE， 2023： 1927-1939.
[23]	DAI Z， GETZEN E， LONG Q. SADI： similarity-aware diffusion model-based imputation for incomplete temporal EHR data［C］// Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2024： 4195-4203.
[24]	TAN C， GAO Z， WU L， et al. Temporal attention unit： towards efficient spatiotemporal predictive learning［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 18770-18782.
[25]	ZHANG S， GUO B， DONG A， et al. Cautionary tales on air-quality improvement in Beijing［J］. Proceedings of the Royal Society A： Mathematical， Physical and Engineering Sciences， 2017， 473（2205）： No.20170457.
[26]	WU H， XU J， WANG J， et al. Autoformer： decomposition transformers with auto-correlation for long-term series forecasting［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 22419-22430.

算法	算法运行时刻的真实值状态	算法运行时的模型生成值状态
预测算法	真实值不存在	预测值不可评价
填充算法	真实值已存在，不可观测	填充值可间接评价

算法	算法运行时刻的真实值状态	算法运行时的模型生成值状态
预测算法	真实值不存在	预测值不可评价
填充算法	真实值已存在，不可观测	填充值可间接评价

数据集	采样点数	维度	原始缺失率/%
AQI	8 760	36	13.3
ETT	17 421	6	0.0
Weather	52 697	21	0.0

数据集	采样点数	维度	原始缺失率/%
AQI	8 760	36	13.3
ETT	17 421	6	0.0
Weather	52 697	21	0.0

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	65.83	12.47	26.99	10.29	7.13	11.96	7.90	6.83
10	RMSE	93.47	21.29	57.17	18.81	12.71	20.22	14.95	12.12
20	MAE	66.12	13.52	26.90	10.60	7.39	12.70	8.74	7.12
20	RMSE	92.07	22.63	57.07	19.31	13.22	21.51	16.90	12.53
50	MAE	67.13	18.86	27.50	12.34	8.64	14.73	12.04	8.40
50	RMSE	106.02	31.69	57.32	22.37	15.68	25.35	23.95	15.01
90	MAE	80.22	41.75	32.83	17.06	14.27	28.47	33.27	14.12
90	RMSE	124.21	62.27	60.91	30.33	24.70	46.34	59.44	24.74

基于扩散模型的增量式时间序列缺失值填充算法

Incremental missing value imputation algorithm for time series based on diffusion model

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 26

相关文章 15

编辑推荐

Metrics

数据集	batch_size	epoch	loss	learning_rate	diff_steps	res_channels	n_samples
AQI	16	100	huber	0.001 0	50	64	500
ETT	16	100	huber	0.000 5	50	64	300
Weather	16	100	huber	0.000 5	50	64	500

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	2.53	0.47	1.09	0.35	0.25	0.51	0.47	0.23
10	RMSE	4.63	1.05	2.80	0.97	0.48	1.17	0.91	0.43
20	MAE	2.57	0.54	1.13	0.38	0.29	0.54	0.52	0.27
20	RMSE	4.52	1.18	2.83	1.00	0.61	1.17	1.02	0.52
50	MAE	2.94	0.83	1.45	0.52	0.44	0.81	0.63	0.41
50	RMSE	4.71	1.66	3.22	1.18	1.00	1.89	1.26	0.90
90	MAE	3.72	2.29	3.21	1.17	1.07	1.83	1.51	1.14
90	RMSE	5.43	4.21	5.74	2.49	2.33	3.51	3.26	2.38

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	66.23	6.61	20.37	3.96	3.02	6.67	5.68	2.87
10	RMSE	188.88	35.81	100.09	28.97	21.21	39.36	37.74	19.40
20	MAE	77.34	9.12	19.79	4.31	3.39	7.96	5.77	3.27
20	RMSE	185.29	43.75	97.88	29.65	26.48	53.90	37.34	24.26
50	MAE	114.93	24.31	20.98	5.88	4.41	11.35	6.62	4.28
50	RMSE	256.64	79.73	101.36	38.76	32.37	65.57	42.20	31.08
90	MAE	165.42	67.90	50.01	11.33	9.07	29.38	15.71	8.71
90	RMSE	375.36	191.95	156.47	58.56	50.57	117.72	82.01	47.58

缺失率/%	指标	No TAM	No ISM	I2TDM
10	MAE	8.38	6.84	6.83
10	RMSE	16.17	12.23	12.12
50	MAE	12.85	8.45	8.40
50	RMSE	25.28	15.15	15.01
90	MAE	38.22	14.22	14.12
90	RMSE	61.43	24.80	24.74

[1]	王慧斌, 胡展傲, 胡节, 徐袁伟, 文博. 基于分段注意力机制的时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2262-2268.
[2]	李岚皓, 严皓钧, 周号益, 孙庆赟, 李建欣. 基于神经网络的多尺度信息融合时间序列长期预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1776-1783.
[3]	闫龙博, 毛文涛, 仲志鸿, 范黎林. 面向城市排水管网缺陷诊断的鲁棒无监督多任务异常检测方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1833-1840.
[4]	杨光局, 罗天健, 王开军, 杨思琪. 多分支多视图的时间序列上下文对比表征学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1042-1052.
[5]	李强, 白少雄, 熊源, 袁薇. 基于视觉大模型隐私保护的监控图像定位[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 832-839.
[6]	王猛, 张大千, 周冰艳, 马倩影, 吕继东. 基于时序知识图谱补全的CTCS-3级列控车载接口设备故障诊断方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 677-684.
[7]	张倩婷, 胡丽莹, 陈黎飞. 时间序列的鲁棒形态表征方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 436-443.
[8]	张翰林, 王俊陆, 宋宝燕. 融合衍生特征的时间序列事件分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 428-435.
[9]	胡健鹏, 张立臣. 面向多时间步风功率预测的深度时空网络模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 98-105.
[10]	张思齐, 张金俊, 王天一, 秦小林. 基于信号时态逻辑的深度时序事件检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 90-97.
[11]	范黎林, 曹富康, 王琬婷, 杨凯, 宋钊瑜. 基于需求模式自适应匹配的间歇性需求预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2747-2755.
[12]	任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746.
[13]	赵秦壮, 谭红叶. 基于自适应阈值学习的时序因果推断方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2660-2666.
[14]	李晨阳, 张龙, 郑秋生, 钱少华. 基于扩散序列的多元可控文本生成[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2414-2420.
[15]	徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831.