Multivariate time series prediction model based on decoupled attention mechanism

doi:10.11772/j.issn.1001-9081.2023091301

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (9): 2732-2738.DOI: 10.11772/j.issn.1001-9081.2023091301

• Data science and technology • Previous Articles Next Articles

Multivariate time series prediction model based on decoupled attention mechanism

Liting LI¹, Bei HUA¹(), Ruozhou HE¹, Kuang XU²

^1.School of Computer Science and Technology，University of Science and Technology of China，Hefei Anhui 230027，China
^2.USTC Sinovate Software Company Limited，Hefei Anhui 230088，China

Received:2023-09-20 Revised:2023-12-05 Accepted:2023-12-11 Online:2024-02-07 Published:2024-09-10
Contact: Bei HUA
About author:LI Liting， born in 1999， M. S. candidate. His research interests include deep neural network， data mining， time series prediction.
HE Ruozhou， born in 2000， M. S. candidate. His research interests include deep neural network， intelligent transportation system， spatio-temporal prediction.
XU Kuang， born in 1982， M. S. His research interests include intelligent network orchestration and scheduling.
Supported by:
National Key Research and Development Program of China(2018AAA0101200)

基于解耦注意力机制的多变量时序预测模型

李力铤¹, 华蓓¹(), 贺若舟¹, 徐况²

^1.中国科学技术大学计算机科学与技术学院，合肥 230027
^2.科大国创软件股份有限公司，合肥 230088

通讯作者: 华蓓
作者简介:李力铤（1999—），男，浙江宁波人，硕士研究生，主要研究方向：深度神经网络、数据挖掘、时序预测
华蓓（1966—），女，江苏无锡人，教授，博士，CCF会员，主要研究方向：高性能计算、网络系统、时序预测
贺若舟（2000—），男，广东惠州人，硕士研究生，主要研究方向：深度神经网络、智能交通系统、时空预测
徐况（1982—），男，安徽蚌埠人，硕士，主要研究方向：智能网络编排与调度。
基金资助:
国家重点研发计划(2018AAA0101200)

Abstract

Abstract:

Aiming at the problem that it is difficult to fully utilize the sequence contextual semantic information and the implicit correlation information among variables in multivariate time-series prediction， a model based on decoupled attention mechanism — Decformer was proposed for multivariate time-series prediction. Firstly， a novel decoupled attention mechanism was proposed to fully utilize the embedded semantic information， thereby improving the accuracy of attention weight allocation. Secondly， a pattern correlation mining method without relying on explicit variable relationships was proposed to mine and utilize implicit pattern correlation information among variables. On three different types of real datasets （TTV， ECL and PeMS-Bay）， including traffic volume of call， electricity consumption and traffic， Decformer achieves the highest prediction accuracy over all prediction time lengths compared with excellent open-source multivariate time-series prediction models such as Long- and Short-term Time-series Network （LSTNet）， Transformer and FEDformer. Compared with LSTNet， Decformer has the Mean Absolute Error （MAE） reduced by 17.73%-27.32%， 10.89%-17.01%， and 13.03%-19.64% on TTV， ECL and PeMS-Bay datasets， respectively， and the Mean Squared Error （MSE） reduced by 23.53%-58.96%， 16.36%-23.56% and 15.91%-26.30% on TTV， ECL and PeMS-Bay datasets， respectively. Experimental results indicate that Decformer can enhance the accuracy of multivariate time series prediction significantly.

Key words: multivariate time series prediction, self-attention mechanism, pattern correlation, temporal correlation, embedding mechanism

摘要：

针对多变量时序预测难以充分利用序列上下文语义信息及变量间隐含关联信息的问题，提出一种基于解耦注意力机制的多变量时序预测模型Decformer。首先，提出一种解耦注意力机制，从而充分利用嵌入的语义信息提升注意力权值分配的准确度；其次，提出一种不依赖于显式变量关系的模式关联挖掘方法，以挖掘并利用变量间隐含的模式关联信息。在话务量、电力消耗和交通3种不同类型的真实数据集（TTV、ECL和PeMS-Bay）上，与长短期时间序列网络（LSTNet）、Transformer、FEDformer等优秀的开源多变量时序预测模型相比，Decformer在所有预测时间长度上都取得了最高的预测精度。相较于LSTNet，Decformer在TTV、ECL和PeMS-Bay数据集上的平均绝对误差（MAE）分别降低了17.73%~27.32%、10.89%~17.01%和13.03%~19.64%；均方误差（MSE）分别降低了23.53%~58.96%、16.36%~23.56%和15.91%~26.30%。实验结果表明，Decformer能够有效提升多变量时序预测的精度。

关键词: 多变量时序预测, 自注意力机制, 模式关联, 时间关联, 嵌入机制

CLC Number:

TP181

Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism[J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.

李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.

Figures/Tables 6

References 30

1	ASHKBOOS S， HUANG L， DRYDEN N， et al. ENS-10： a dataset for post-processing ensemble weather forecast ［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 21974-21987.
2	MATSUBARA Y， SAKURAI Y， VAN PANHUIS W G， et al. FUNNEL： automatic mining of spatially coevolving epidemics ［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014： 105-114.
3	DEB C， ZHANG F， YANG J， et al. A review on time series forecasting techniques for building energy consumption ［J］. Renewable and Sustainable Energy Reviews， 2017， 74： 902-924.
4	胡鹤轩，隋华超，胡强，等. 基于图注意力网络与双阶注意力机制的径流预报模型［J］. 计算机应用， 2022， 42（5）： 1607-1615.
	HU H X， SUI H C， HU Q， et al. Runoff forecast model based on graph attention network and dual-stage attention mechanism ［J］. Journal of Computer Applications， 2022， 42（5）： 1607-1615.
5	LI Y， YU R， SHAHABI C， et al. Diffusion convolutional recurrent neural network： data-driven traffic forecasting ［EB/OL］. （2018-02-22）［2022-10-02］..
6	夏进，王正群，朱世明. 基于时间序列分解的交通流量预测模型［J］. 计算机应用， 2023， 43（4）： 1129-1135.
	XIA J， WANG Z Q， ZHU S M. Traffic flow prediction model based on time series decomposition［J］. Journal of Computer Applications， 2023， 43（4）： 1129-1135.
7	STAVROGLOU S K， PANTELOUS A A， STANLEY H E， et al. Hidden interactions in financial markets ［J］. Proceedings of the National Academy of Sciences of the United States of America， 2019， 116（22）： 10646-10651.
8	李晓杰，崔超然，宋广乐，等. 基于时序超图卷积神经网络的股票趋势预测方法［J］. 计算机应用， 2022， 42（3）： 797-803.
	LI X J， CUI C R， SONG G L， et al. Stock trend prediction method based on temporal hypergraph convolutional neural network ［J］. Journal of Computer Applications， 2022， 42（3）： 797-803.
9	LAI G， CHANG W C， YANG Y， et al. Modeling long- and short-term temporal patterns with deep neural networks ［C］// Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2018： 95-104.
10	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
11	ZHOU H， ZHANG S， PENG J， et al. Informer： beyond efficient Transformer for long sequence time-series forecasting ［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 11106-11115.
12	CIRSTEA R G， GUO C， YANG B， et al. Triformer： triangular， variable-specific attentions for long sequence multivariate time series forecasting ［C］// Proceedings of the 31st International Joint Conference on Artificial Intelligence. California： IJCAI.org， 2022： 1994-2001.
13	ZHOU T， MA Z， WEN Q， et al. FEDformer： frequency enhanced decomposed Transformer for long-term series forecasting ［C］// Proceedings of the 39th International Conference on Machine Learning. New York： JMLR.org， 2022： 27268-27286.
14	WU H， XU J， WANG J， et al. Autoformer： decomposition Transformers with auto-correlation for long-term series forecasting［C］// Proceedings of the 35th Conference on Neural Information Processing Systems.Red Hook： Curran Associates Inc.， 2021： 22419-22430.
15	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
16	RADFORD A， NARASIMHAN K， SALIMANS T， et al. Improving language understanding by generative pre-training ［EB/OL］. （2018-06-11）［2023-02-10］. .
17	WU Z， PAN S， LONG G， et al. Graph WaveNet for deep spatial-temporal graph modeling ［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California： IJCAI.org， 2019： 1907-1913.
18	ZHENG C， FAN X， WANG C， et al. GMAN： a graph multi-attention network for traffic prediction ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2020： 1234-1241.
19	BAHDANAU D， CHO K， BENGIO Y. Neural machine translation by jointly learning to align and translate ［EB/OL］. （2016-05-19）［2022-11-02］. .
20	SHIH S Y， SUN F K， LEE H Y. Temporal pattern attention for multivariate time series forecasting ［J］. Machine Learning， 2019， 108（8/9）： 1421-1441.
21	LIN H， GAO Z， XU Y， et al. Conditional local convolution for spatio-temporal meteorological forecasting ［C］// Proceedings of the 36th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2022： 7470-7478.
22	WU Z， PAN S， LONG G， et al. Connecting the dots： multivariate time series forecasting with graph neural networks ［C］// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2020： 753-763.
23	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
24	BA J L， KIROS J R， HINTON G E. Layer normalization ［EB/OL］. （2016-07-21）［2022-12-03］. .
25	SAKOE H， CHIBA S. Dynamic programming algorithm optimization for spoken word recognition ［J］. IEEE Transactions on Acoustics， Speech， and Signal Processing， 1978， 26（1）： 43-49.
26	SHUMAN D I， NARANG S K， FROSSARD P， et al. The emerging field of signal processing on graphs： extending high-dimensional data analysis to networks and other irregular domains［J］. IEEE Signal Processing Magazine， 2013， 30（3）： 83-98.
27	GROVER A， LESKOVEC J. node2vec： scalable feature learning for networks ［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016： 855-864.
28	PASZKE A， GROSS S， MASSA F， et al. PyTorch： an imperative style， high-performance deep learning library ［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 8026-8037.
29	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2022-09-13］. .
30	VAN DER MAATEN L， HINTON G. Visualizing data using t-SNE［J］. Journal of Machine Learning Research， 2008， 9： 2579-2605.

模型	TTV						ECL						PeMS-Bay
	horizon 24		horizon 48		horizon 96		horizon 24		horizon 48		horizon 96		horizon 24		horizon 48		horizon 96
	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
LSTNet	0.151	0.054	0.141	0.051	0.194	0.134	0.248	0.177	0.288	0.208	0.302	0.220	0.331	0.540	0.353	0.597	0.382	0.642
MTGNN	0.163	0.063	0.173	0.070	0.182	0.084	0.285	0.188	0.320	0.224	0.319	0.226	0.501	0.933	0.626	1.270	0.560	1.091
Transformer	0.194	0.078	0.155	0.055	0.173	0.069	0.359	0.275	0.375	0.296	0.393	0.326	0.355	0.470	0.367	0.542	0.380	0.573
Informer	0.190	0.077	0.158	0.058	0.171	0.067	0.367	0.286	0.422	0.378	0.474	0.447	0.361	0.485	0.402	0.625	0.450	0.744
Autoformer	0.224	0.094	0.204	0.076	0.230	0.098	0.289	0.171	0.310	0.193	0.315	0.200	0.406	0.667	0.594	1.144	0.708	1.400
FEDformer	0.224	0.089	0.157	0.050	0.171	0.059	0.288	0.173	0.288	0.173	0.307	0.195	0.364	0.505	0.403	0.586	0.420	0.611
Triformer	0.219	0.098	0.155	0.053	0.175	0.063	0.304	0.213	0.332	0.249	0.343	0.268	0.337	0.498	0.415	0.663	0.548	0.945
Decformer	0.116	0.038	0.116	0.039	0.141	0.055	0.221	0.144	0.239	0.159	0.255	0.184	0.266	0.398	0.307	0.502	0.328	0.538

模型	TTV						ECL						PeMS-Bay
	horizon 24		horizon 48		horizon 96		horizon 24		horizon 48		horizon 96		horizon 24		horizon 48		horizon 96
	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
LSTNet	0.151	0.054	0.141	0.051	0.194	0.134	0.248	0.177	0.288	0.208	0.302	0.220	0.331	0.540	0.353	0.597	0.382	0.642
MTGNN	0.163	0.063	0.173	0.070	0.182	0.084	0.285	0.188	0.320	0.224	0.319	0.226	0.501	0.933	0.626	1.270	0.560	1.091
Transformer	0.194	0.078	0.155	0.055	0.173	0.069	0.359	0.275	0.375	0.296	0.393	0.326	0.355	0.470	0.367	0.542	0.380	0.573
Informer	0.190	0.077	0.158	0.058	0.171	0.067	0.367	0.286	0.422	0.378	0.474	0.447	0.361	0.485	0.402	0.625	0.450	0.744
Autoformer	0.224	0.094	0.204	0.076	0.230	0.098	0.289	0.171	0.310	0.193	0.315	0.200	0.406	0.667	0.594	1.144	0.708	1.400
FEDformer	0.224	0.089	0.157	0.050	0.171	0.059	0.288	0.173	0.288	0.173	0.307	0.195	0.364	0.505	0.403	0.586	0.420	0.611
Triformer	0.219	0.098	0.155	0.053	0.175	0.063	0.304	0.213	0.332	0.249	0.343	0.268	0.337	0.498	0.415	0.663	0.548	0.945
Decformer	0.116	0.038	0.116	0.039	0.141	0.055	0.221	0.144	0.239	0.159	0.255	0.184	0.266	0.398	0.307	0.502	0.328	0.538

模型变体	MAE	MSE	误差平均上升/%
Decformer	0.116	0.039	—
使用自注意力	0.120	0.041	4.29
不进行模式嵌入预训练	0.118	0.040	2.14
去掉模式注意力模块	0.121	0.042	6.00
去掉时间注意力模块	0.118	0.041	3.43

模型变体	MAE	MSE	误差平均上升/%
Decformer	0.116	0.039	—
使用自注意力	0.120	0.041	4.29
不进行模式嵌入预训练	0.118	0.040	2.14
去掉模式注意力模块	0.121	0.042	6.00
去掉时间注意力模块	0.118	0.041	3.43

[1]	Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[4]	Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977.
[5]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
[6]	Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392.
[7]	Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384.
[8]	Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418.
[9]	Xingyao YANG, Hongtao SHEN, Zulian ZHANG, Jiong YU, Jiaying CHEN, Dongxiao WANG. Sequential recommendation based on hierarchical filter and temporal convolution enhanced self-attention network [J]. Journal of Computer Applications, 2024, 44(10): 3090-3096.
[10]	Yanbo LI, Qing HE, Shunyi LU. Aspect sentiment triplet extraction integrating semantic and syntactic information [J]. Journal of Computer Applications, 2024, 44(10): 3275-3280.
[11]	Hanxiao SHI, Leichun WANG. Short-term power load forecasting by graph convolutional network combining LSTM and self-attention mechanism [J]. Journal of Computer Applications, 2024, 44(1): 311-317.
[12]	Jia CHEN, Hong ZHANG. Image text retrieval method based on feature enhancement and semantic correlation matching [J]. Journal of Computer Applications, 2024, 44(1): 16-23.
[13]	Li’an CHEN, Yi GUO. Text sentiment analysis model based on individual bias information [J]. Journal of Computer Applications, 2024, 44(1): 145-151.
[14]	Guolong YUAN, Yujin ZHANG, Yang LIU. Image tampering forensics network based on residual feedback and self-attention [J]. Journal of Computer Applications, 2023, 43(9): 2925-2931.
[15]	Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106.

Multivariate time series prediction model based on decoupled attention mechanism

基于解耦注意力机制的多变量时序预测模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 30

Related Articles 15

Recommended Articles

Metrics