Incremental missing value imputation algorithm for time series based on diffusion model

doi:10.11772/j.issn.1001-9081.2024071046

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (8): 2582-2591.DOI: 10.11772/j.issn.1001-9081.2024071046

• Data science and technology • Previous Articles

Incremental missing value imputation algorithm for time series based on diffusion model

Xingjie FENG¹, Xingpeng BIAN¹, Xiaorong FENG²(), Xinglong WANG²

^1.College of Computer Science and Technology，Civil Aviation University of China，Tianjin 300300，China
^2.College of Air Traffic Management，Civil Aviation University of China，Tianjin 300300，China

Received:2024-07-26 Revised:2024-09-29 Accepted:2024-10-11 Online:2024-11-19 Published:2025-08-10
Contact: Xiaorong FENG
About author:FENG Xingjie， born in 1969， Ph. D.， professor. His research interests include data warehouse， intelligent information processing.
BIAN Xingpeng， born in 1998， M. S. candidate. His research interests include deep learning， flight data processing.
WANG Xinglong， born in 1979， M. S.， research fellow. His research interests include airspace operational safety， flight data processing.
Supported by:
Key Program of National Natural Science Foundation of China(U2133207);General Program of National Natural Science Foundation of China(62173332);National Defense Technology 173 Project(2022-JCJQ-JJ-0874);Fundamental Research Funds for the Central Universities(3122020051);Postgraduate Scientific Research Innovation Project of Civil Aviation University of China(2023YJSKC05002)

基于扩散模型的增量式时间序列缺失值填充算法

冯兴杰¹, 卞兴鹏¹, 冯小荣²(), 王兴隆²

^1.中国民航大学计算机科学与技术学院，天津 300300
^2.中国民航大学空中交通管理学院，天津 300300

通讯作者: 冯小荣
作者简介:冯兴杰（1969—），男，河北邢台人，教授，博士，主要研究方向：数据仓库、智能信息处理
卞兴鹏（1998—），男，江苏兴化人，硕士研究生，CCF会员，主要研究方向：深度学习、飞行数据处理
王兴隆（1979—），男，黑龙江北安人，研究员，硕士，主要研究方向：空域运行安全、飞行数据处理。
基金资助:
国家自然科学基金重点项目(U2133207);国家自然科学基金面上项目(62173332);国防科技173计划技术领域基金资助项目(2022-JCJQ-JJ-0874);中央高校基本科研业务费专项(3122020051);中国民航大学研究生科研创新项目(2023YJSKC05002)

Abstract

Abstract:

It is a common issue in time series to encounter missing data， which complicates subsequent time series analysis. Effective missing value imputation is crucial for improving data quality and mining data value. However， attention modules designed for complete data in time series prediction tasks are often used in the existing imputation algorithms， which are insufficient for extracting spatio-temporal features from time series with missing values. Additionally， it is rare for the existing imputation algorithms to perform in-depth research on imputation patterns， as they underestimate the intermediate values generated during imputation process， so that there is still room for improvement in the accuracy of the imputation. In view of the above problems， an Incremental missing value Imputation algorithm for Time series based on Diffusion Model （I2TDM） was proposed. In I2TDM， to enhance the feature extraction capabilities for time series with missing values， a temporal attention module was incorporated into the traditional diffusion model. At the same time， to improve stability and accuracy of the imputation algorithm， a novel incremental imputation algorithm was proposed to use the incremental selection module to retain partial intermediate imputation values. Experimental results of imputation experiments on 3 datasets — Air Quality Index （AQI）， Electricity Transformer Temperature （ETT） and Weather show that compared with baseline models such as CSDI， SAITS， and PriSTI， the I2TDM achieves a reduction of at least 2.92% in the Mean Absolute Error （MAE） metric and at least 3.49% in the Root Mean Square Error （RMSE） metric， which demonstrates the effectiveness of I2TDM in improving missing value imputation accuracy of time series.

Key words: time series, missing value imputation, diffusion model, temporal attention, incremental imputation

摘要：

时间序列中的数据缺失是一个普遍存在的问题，这会给后续分析带来困难，对缺失值的有效填充是提升数据质量以及挖掘数据价值的重要着力点。然而，现有的填充算法在特征提取方面多沿用时序预测任务的面向非缺失数据的注意力模块，而对含有缺失值的时间序列的时空特征提取效果欠佳。此外，现有的填充算法缺乏对填充规律的深入研究，这让它们对于填充过程中的阶段性填充值利用不足，导致填充的准确率有待进一步提升。为了解决上述问题，提出一种基于扩散模型的增量式时间序列缺失值填充算法（I2TDM）。I2TDM在经典扩散模型中融入时序注意力模块，以增强对于含有缺失值的时间序列的特征提取能力。同时，设计一个新颖的增量式填充算法，使用增量选择模块保留部分阶段性填充值，从而提升填充算法的稳定性与准确率。在空气质量指数（AQI）、电力变压器油温（ETT）和天气（Weather）3个公开数据集上的填充实验结果表明，I2TDM相较于CSDI、SAITS和PriSTI等基线模型在平均绝对误差（MAE）指标上至少降低了2.92%，在均方根误差（RMSE）指标上至少降低了3.49%。可见，I2TDM能够有效提升时间序列缺失值填充的准确率。

关键词: 时间序列, 缺失值填充, 扩散模型, 时序注意力, 增量式填充

CLC Number:

TP391.4

Xingjie FENG, Xingpeng BIAN, Xiaorong FENG, Xinglong WANG. Incremental missing value imputation algorithm for time series based on diffusion model[J]. Journal of Computer Applications, 2025, 45(8): 2582-2591.

冯兴杰, 卞兴鹏, 冯小荣, 王兴隆. 基于扩散模型的增量式时间序列缺失值填充算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2582-2591.

Figures/Tables 14

References 26

[1]	DU W， CÔTÉ D， BARBER C， et al. Forecasting loss of signal in optical networks with machine learning［J］. Journal of Optical Communications and Networking， 2021， 13（10）： E109-E121.
[2]	SILVA I， MOODY G， SCOTT D J， et al. Predicting in-hospital mortality of ICU patients： the PhysioNet/Computing in cardiology challenge 2012［C］// Proceedings of the 2012 Computing in Cardiology. Piscataway： IEEE， 2012： 245-248.
[3]	YI X， ZHENG Y， ZHANG J， et al. ST-MVL： filling missing values in geo-sensory time series data［C］// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016： 2704-2710.
[4]	HO J， JAIN A， ABBEEL P. Denoising diffusion probabilistic models［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 6840-6851.
[5]	LUGMAYR A， DANELLJAN M， ROMERO A， et al. RePaint： inpainting using denoising diffusion probabilistic models［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 11451-11461.
[6]	XIA B， ZHANG Y， WANG S， et al. DiffIR： efficient diffusion model for image restoration［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 13049-13059.
[7]	ROMBACH R， BLATTMANN A， LORENZ D， et al. High-resolution image synthesis with latent diffusion models［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10674-10685.
[8]	刘泽润，尹宇飞，薛文灏，等. 基于扩散模型的条件引导图像生成综述［J］. 浙江大学学报（理学版）， 2023， 50（6）：651-667.
	LIU Z R， YIN Y F， XUE W H， et al. A review of conditional image generation based on diffusion models［J］. Journal of Zhejiang University （Science Edition）， 2023， 50（6）： 651-667.
[9]	KONG Z， PING W， HUANG J， et al. DiffWave： a versatile diffusion model for audio synthesis［EB/OL］. ［2024-06-11］..
[10]	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
[11]	WHITE I R， ROYSTON P， WOOD A M. Multiple imputation using chained equations： issues and guidance for practice［J］. Statistics in Medicine， 2011， 30（4）： 377-399.
[12]	BATISTA G E A P A， MONARD M C. A study of k-nearest neighbour as an imputation method［C］// Proceedings of the 2nd International Conference on Hybrid Intelligent Systems： Soft Computing Systems — Design， Management and Applications. Amsterdam： IOS Press， 2002： 251-260.
[13]	STEKHOVEN D J， BÜHLMANN P. MissForest — non-parametric missing value imputation for mixed-type data［J］. Bioinformatics， 2012， 28（1）： 112-118.
[14]	CHE Z， PURUSHOTHAM S， CHO K， et al. Recurrent neural networks for multivariate time series with missing values［J］. Scientific Reports， 2018， 8： No.6085.
[15]	CAO W， WANG D， LI J， et al. BRITS： bidirectional recurrent imputation for time series［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 6776-6786.
[16]	DU W， CÔTÉ D， LIU Y. SAITS： self-attention-based imputation for time series［J］. Expert Systems with Applications， 2023， 219： No.119619.
[17]	YOON J， JORDON J， VAN DER SCHAAR M. GAIN： missing data imputation using generative adversarial nets［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 5689-5698.
[18]	GOODFELLOW I， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial networks［J］. Communications of the ACM， 2020， 63（11）： 139-144.
[19]	OH E， KIM T， JI Y， et al. STING： self-attention based time-series imputation networks using GAN［C］// Proceedings of the 2021 IEEE International Conference on Data Mining. Piscataway： IEEE， 2021： 1264-1269.
[20]	TASHIRO Y， SONG J， SONG Y， et al. CSDI： conditional score-based diffusion models for probabilistic time series imputation［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 24804-24816.
[21]	ALCARAZ J L， STRODTHOFF N. Diffusion-based time series imputation and forecasting with structured state space models［EB/OL］. ［2024-06-28］..
[22]	LIU M， HUANG H， FENG H， et al. PriSTI： a conditional diffusion framework for spatiotemporal imputation［C］// Proceedings of the IEEE 39th International Conference on Data Engineering. Piscataway： IEEE， 2023： 1927-1939.
[23]	DAI Z， GETZEN E， LONG Q. SADI： similarity-aware diffusion model-based imputation for incomplete temporal EHR data［C］// Proceedings of the 27th International Conference on Artificial Intelligence and Statistics. New York： JMLR.org， 2024： 4195-4203.
[24]	TAN C， GAO Z， WU L， et al. Temporal attention unit： towards efficient spatiotemporal predictive learning［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 18770-18782.
[25]	ZHANG S， GUO B， DONG A， et al. Cautionary tales on air-quality improvement in Beijing［J］. Proceedings of the Royal Society A： Mathematical， Physical and Engineering Sciences， 2017， 473（2205）： No.20170457.
[26]	WU H， XU J， WANG J， et al. Autoformer： decomposition transformers with auto-correlation for long-term series forecasting［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 22419-22430.

算法	算法运行时刻的真实值状态	算法运行时的模型生成值状态
预测算法	真实值不存在	预测值不可评价
填充算法	真实值已存在，不可观测	填充值可间接评价

算法	算法运行时刻的真实值状态	算法运行时的模型生成值状态
预测算法	真实值不存在	预测值不可评价
填充算法	真实值已存在，不可观测	填充值可间接评价

数据集	采样点数	维度	原始缺失率/%
AQI	8 760	36	13.3
ETT	17 421	6	0.0
Weather	52 697	21	0.0

数据集	采样点数	维度	原始缺失率/%
AQI	8 760	36	13.3
ETT	17 421	6	0.0
Weather	52 697	21	0.0

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	65.83	12.47	26.99	10.29	7.13	11.96	7.90	6.83
10	RMSE	93.47	21.29	57.17	18.81	12.71	20.22	14.95	12.12
20	MAE	66.12	13.52	26.90	10.60	7.39	12.70	8.74	7.12
20	RMSE	92.07	22.63	57.07	19.31	13.22	21.51	16.90	12.53
50	MAE	67.13	18.86	27.50	12.34	8.64	14.73	12.04	8.40
50	RMSE	106.02	31.69	57.32	22.37	15.68	25.35	23.95	15.01
90	MAE	80.22	41.75	32.83	17.06	14.27	28.47	33.27	14.12
90	RMSE	124.21	62.27	60.91	30.33	24.70	46.34	59.44	24.74

Incremental missing value imputation algorithm for time series based on diffusion model

基于扩散模型的增量式时间序列缺失值填充算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 26

Related Articles 15

Recommended Articles

Metrics

数据集	batch_size	epoch	loss	learning_rate	diff_steps	res_channels	n_samples
AQI	16	100	huber	0.001 0	50	64	500
ETT	16	100	huber	0.000 5	50	64	300
Weather	16	100	huber	0.000 5	50	64	500

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	2.53	0.47	1.09	0.35	0.25	0.51	0.47	0.23
10	RMSE	4.63	1.05	2.80	0.97	0.48	1.17	0.91	0.43
20	MAE	2.57	0.54	1.13	0.38	0.29	0.54	0.52	0.27
20	RMSE	4.52	1.18	2.83	1.00	0.61	1.17	1.02	0.52
50	MAE	2.94	0.83	1.45	0.52	0.44	0.81	0.63	0.41
50	RMSE	4.71	1.66	3.22	1.18	1.00	1.89	1.26	0.90
90	MAE	3.72	2.29	3.21	1.17	1.07	1.83	1.51	1.14
90	RMSE	5.43	4.21	5.74	2.49	2.33	3.51	3.26	2.38

缺失值填充比例/%	指标	Median	BRITS	GAIN	SAITS	CSDI	SSSD	PriSTI	I2TDM
10	MAE	66.23	6.61	20.37	3.96	3.02	6.67	5.68	2.87
10	RMSE	188.88	35.81	100.09	28.97	21.21	39.36	37.74	19.40
20	MAE	77.34	9.12	19.79	4.31	3.39	7.96	5.77	3.27
20	RMSE	185.29	43.75	97.88	29.65	26.48	53.90	37.34	24.26
50	MAE	114.93	24.31	20.98	5.88	4.41	11.35	6.62	4.28
50	RMSE	256.64	79.73	101.36	38.76	32.37	65.57	42.20	31.08
90	MAE	165.42	67.90	50.01	11.33	9.07	29.38	15.71	8.71
90	RMSE	375.36	191.95	156.47	58.56	50.57	117.72	82.01	47.58

缺失率/%	指标	No TAM	No ISM	I2TDM
10	MAE	8.38	6.84	6.83
10	RMSE	16.17	12.23	12.12
50	MAE	12.85	8.45	8.40
50	RMSE	25.28	15.15	15.01
90	MAE	38.22	14.22	14.12
90	RMSE	61.43	24.80	24.74

[1]	Huibin WANG, Zhan’ao HU, Jie HU, Yuanwei XU, Bo WEN. Time series forecasting model based on segmented attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2262-2268.
[2]	Longbo YAN, Wentao MAO, Zhihong ZHONG, Lilin FAN. Robust unsupervised multi-task anomaly detection method for defect diagnosis of urban drainage pipe network [J]. Journal of Computer Applications, 2025, 45(6): 1833-1840.
[3]	Lanhao LI, Haojun YAN, Haoyi ZHOU, Qingyun SUN, Jianxin LI. Multi-scale information fusion time series long-term forecasting model based on neural network [J]. Journal of Computer Applications, 2025, 45(6): 1776-1783.
[4]	Guangju YANG, Tianjian LUO, Kaijun WANG, Siqi YANG. Multi-branch multi-view based contextual contrastive representation learning method for time series [J]. Journal of Computer Applications, 2025, 45(4): 1042-1052.
[5]	Qiang LI, Shaoxiong BAI, Yuan XIONG, Wei YUAN. Privacy preserving localization of surveillance images based on large vision models [J]. Journal of Computer Applications, 2025, 45(3): 832-839.
[6]	Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG. Enterprise ESG indicator prediction model based on richness coordination technology [J]. Journal of Computer Applications, 2025, 45(2): 670-676.
[7]	Qianting ZHANG, Liying HU, Lifei CHEN. Robust shapelet representation method for time series [J]. Journal of Computer Applications, 2025, 45(2): 436-443.
[8]	Hanlin ZHANG, Junlu WANG, Baoyan SONG. Time series event classification method fused with derived features [J]. Journal of Computer Applications, 2025, 45(2): 428-435.
[9]	Jianpeng HU, Lichen ZHANG. Deep spatio-temporal network model for multi-time step wind power prediction [J]. Journal of Computer Applications, 2025, 45(1): 98-105.
[10]	Zijun MIAO, Fei LUO, Weichao DING, Wenbo DONG. Traffic signal control algorithm based on overall state prediction and fair experience replay [J]. Journal of Computer Applications, 2025, 45(1): 337-344.
[11]	Siqi ZHANG, Jinjun ZHANG, Tianyi WANG, Xiaolin QIN. Deep temporal event detection algorithm based on signal temporal logic [J]. Journal of Computer Applications, 2025, 45(1): 90-97.
[12]	Qinzhuang ZHAO, Hongye TAN. Time series causal inference method based on adaptive threshold learning [J]. Journal of Computer Applications, 2024, 44(9): 2660-2666.
[13]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[14]	Lilin FAN, Fukang CAO, Wanting WANG, Kai YANG, Zhaoyu SONG. Intermittent demand forecasting method based on adaptive matching of demand patterns [J]. Journal of Computer Applications, 2024, 44(9): 2747-2755.
[15]	Chenyang LI, Long ZHANG, Qiusheng ZHENG, Shaohua QIAN. Multivariate controllable text generation based on diffusion sequences [J]. Journal of Computer Applications, 2024, 44(8): 2414-2420.