Unsupervised time series anomaly detection model based on re-encoding

doi:10.11772/j.issn.1001-9081.2022010006

Abstract

Abstract:

In order to deal with the problem of low accuracy of anomaly detection caused by data imbalance and highly complex temporal correlation of time series， a re-encoding based unsupervised time series anomaly detection model based on Generative Adversarial Network （GAN）， named RTGAN （Re-encoding Time series based on GAN）， was proposed. Firstly， multiple generators with cycle consistency were used to ensure the diversity of generated samples and thereby learning different anomaly patterns. Secondly， the stacked Long Short-Term Memory-dropout Recurrent Neural Network （LSTM-dropout RNN） was used to capture temporal correlation. Thirdly， the differences between the generated samples and the real samples were compared in the latent space by improved re-encoding. As the re-encoding errors， these differences were served as a part of anomaly score to improve the accuracy of anomaly detection. Finally， the new anomaly score was used to detect anomalies on univariate and multivariate time series datasets. The proposed model was compared with seven baseline anomaly detection models on univariate and multivariate time series. Experimental results show that the proposed model obtains the highest average F1-score （0.815） on all datasets. And the overall performance of the proposed model is 36.29% and 8.52% respectively higher than those of the original AutoEncoder （AE） model Dense-AE （Dense-AutoEncoder） and latest benchmark model USAD （UnSupervised Anomaly Detection on multivariate time series）. The robustness of the model was detected by different Signal-to-Noise Ratio （SNR）. The results show that the proposed model consistently outperforms LSTM-VAE （Variational Autoencoder based on LSTM）， USAD and OmniAnomaly， especially in the case of 30% SNR， the F1-score of RTGAN is 13.53% and 10.97% respectively higher than those of USAD and OmniAnomaly. It can be seen that RTGAN can effectively improve the accuracy and robustness of anomaly detection.

Key words: Generative Adversarial Network (GAN), anomaly detection, time series, stacked Long Short-Term Memory (LSTM) network, AutoEncoder (AE), re-encoding

摘要：

针对时间序列的数据不平衡和高度复杂的时间相关性导致的异常检测准确率低的问题，以生成对抗网络（GAN）作为基础提出一种基于再编码的无监督时间序列异常检测模型RTGAN。首先，使用具有周期一致性的多个生成器保证生成样本的多样性，从而学习不同的异常模式；其次，使用堆叠式LSTM-dropout RNN捕获时间相关性；然后，使用二次编码在潜在空间中比较生成样本和真实样本之间的差异，并将此差异作为再编码误差当作异常分数的一部分，从而提高异常检测的准确率；最后，使用新的异常分数对单变量和多变量时间序列数据集进行异常检测。将所提模型与七种基线异常检测模型在单变量和多变量时间序列上进行了比较。实验结果表明，所提模型在所有数据集上均获得了最高的平均F1值（0.815），并且总体性能分别比原始自编码器（AE）模型Dense-AE和最新的基准模型USAD高出36.29%和8.52%。通过不同的信噪比（SNR）检测模型的健壮性，结果表明所提模型一直优于LSTM-VAE、USAD和OmniAnomaly，尤其在SNR为30%情况下，RTGAN的F1值分别比USAD和OmniAnomaly高出13.53%和10.97%。可见所提模型能有效提高异常检测的准确率和鲁棒性。

关键词: 生成对抗网络, 异常检测, 时间序列, 堆叠式长短期记忆网络, 自编码器, 再编码

CLC Number:

TP391.1

Chunyong YIN, Liwen ZHOU. Unsupervised time series anomaly detection model based on re-encoding[J]. Journal of Computer Applications, 2023, 43(3): 804-811.

尹春勇, 周立文. 基于再编码的无监督时间序列异常检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 804-811.

Figures/Tables 9

References 26

1	CHOI K， YI J， PARK C， et al. Deep learning for anomaly detection in time-series data： review， analysis， and guidelines［J］. IEEE Access， 2021， 9： 120043-120065. 10.1109/ACCESS.2021.3107975
2	YANG J F， SUN Y， LIANG J， et al. Image captioning by incorporating affective concepts learned from both visual and textual components［J］. Neurocomputing， 2019， 328： 56-68. 10.1016/j.neucom.2018.03.078
3	LI X X， KANG Y F， LI F. Forecasting with time series imaging［J］. Expert Systems with Applications， 2020， 160： No.113680. 10.1016/j.eswa.2020.113680
4	LI D， CHEN D C， JIN B H， et al. MAD-GAN： multivariate anomaly detection for time series data with generative adversarial networks［C］// Proceedings of the 2019 International Conference on Artificial Neural Networks， LNCS 11730. Cham： Springer， 2019： 703-716.
5	AUDIBERT J， MICHIARDI P， GUYARD F， et al. USAD： unsupervised anomaly detection on multivariate time series［C］// Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2020： 3395-3404. 10.1145/3394486.3403392
6	FAN C， XIAO F， ZHAO Y， et al. Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data［J］. Applied Energy， 2018， 211： 1123-1135. 10.1016/j.apenergy.2017.12.005
7	YAO R， LIU C D， ZHANG L X， et al. Unsupervised anomaly detection using variational auto-encoder based feature extraction［C］// Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management. Piscataway： IEEE， 2019： 1-7. 10.1109/icphm.2019.8819434
8	SCHLEGL T， SEEBÖCK P， WALDSTEIN S M， et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery［C］// Proceedings of the 2017 International Conference on Information Processing in Medical Imaging， LNCS 10265. Cham： Springer， 2017： 146-157.
9	DENDORFER P， ELFLEIN S， LEAL-TAIXÉ L. MG-GAN： a multi-generator model preventing out-of-distribution samples in pedestrian trajectory prediction［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 13138-13147. 10.1109/iccv48922.2021.01291
10	KIEU T， YANG B， JENSEN C S. Outlier detection for multidimensional time series using deep neural networks［C］// Proceedings of the 19th IEEE International Conference on Mobile Data Management. Piscataway： IEEE， 2018： 125-134. 10.1109/mdm.2018.00029
11	REN H S， XU B X， WANG Y J， et al. Time-series anomaly detection service at Microsoft［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 3009-3017. 10.1145/3292500.3330680
12	COOK A A， MISIRLI G， FAN Z. Anomaly detection for IoT time-series data： a survey［J］. IEEE Internet of Things Journal， 2020， 7（7）： 6481-6494. 10.1109/jiot.2019.2958185
13	RAMASWAMY S， RASTOGI R， SHIM K， et al. Efficient algorithms for mining outliers from large data sets［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2000： 427-438. 10.1145/335191.335437
14	BREUNIG M M， KRIEGEL H P， NG R T， et al. LOF： identifying density-based local outliers［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2000： 93-104. 10.1145/335191.335388
15	ZARE MOAYEDI H， MASNADI-SHIRAZI M A. ARIMA model for network traffic prediction and anomaly detection［C］// Proceedings of the 2008 International Symposium on Information Technology. Piscataway： IEEE， 2008： 1-6. 10.1109/itsim.2008.4631947
16	HE Q P， QIN S J， WANG J. A new fault diagnosis method using fault directions in Fisher discriminant analysis［J］. AIChE Journal， 2005， 51（2）： 555-571. 10.1002/aic.10325
17	AHMAD S， LAVIN A， PURDY S， et al. Unsupervised real-time anomaly detection for streaming data［J］. Neurocomputing， 2017， 262： 134-147. 10.1016/j.neucom.2017.04.070
18	RINGBERG H， SOULE A， REXFORD J， et al. Sensitivity of PCA for traffic anomaly detection［C］// Proceedings of the 2017 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York： ACM， 2007： 109-120. 10.1145/1254882.1254895
19	DAI L， LIN T， LIU C， et al. SDFVAE： static and dynamic factorized VAE for anomaly detection of multivariate CDN KPIs［C］// Proceedings of the 2021 World Wide Web Conference. New York： ACM， 2021： 3076-3086. 10.1145/3442381.3450013
20	霍纬纲，王慧芳. 基于自编码器和隐马尔可夫模型的时间序列异常检测方法［J］. 计算机应用， 2020， 40（5）： 1329-1334.
	HUO W G， WANG H F. Time series anomaly detection method based on autoencoder and HMM［J］. Journal of Computer Applications， 2020， 40（5）： 1329-1334.
21	VON SCHLEINITZ J， GRAF M， TRUTSCHNIG W， et al. VASP： an autoencoder-based approach for multivariate anomaly detection and robust time series prediction with application in motorsport［J］. Engineering Applications of Artificial Intelligence， 2021， 104： No.104354. 10.1016/j.engappai.2021.104354
22	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014： 2672-2680.
23	YOOH J， JARRETT D， M VAN DER SCHAAR. Time-series generative adversarial networks［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-09-21］..
24	王静，邹慧敏，曲东东，等. 基于经验模态分解生成对抗网络的金融时间序列预测［J］. 计算机应用与软件， 2020， 37（5）： 293-297. 10.3969/j.issn.1000-386x.2020.05.050
	WANG J， ZOU H M， QU D D， et al. Financial time series prediction based on empirical mode decomposition to generate adversarial networks［J］. Computer Applications and Software， 2020， 37（5）： 293-297. 10.3969/j.issn.1000-386x.2020.05.050
25	GULRAJANI I， AHEMD F， ARJOVSKY M， et al. Improved training of Wasserstein GANs［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 5769-5779.
26	SU Y， ZHAO Y J， NIU C H， et al. Robust anomaly detection for multivariate time series through stochastic recurrent neural network［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 2828-2837.

数据集	样本总数	样本数分类		异常比率/%	特征数
数据集	样本总数	训练集	测试集	异常比率/%	特征数
SWaT	946 719	568 031	378 688	11.98	51
WADI	577 658	346 594	231 064	5.99	127
SMAP	562 800	337 680	225 120	13.13	25
MSL	132 046	79 227	52 819	10.72	55

数据集	样本总数	样本数分类		异常比率/%	特征数
数据集	样本总数	训练集	测试集	异常比率/%	特征数
SWaT	946 719	568 031	378 688	11.98	51
WADI	577 658	346 594	231 064	5.99	127
SMAP	562 800	337 680	225 120	13.13	25
MSL	132 046	79 227	52 819	10.72	55

模型	单变量时间序列		多变量时间序列		最优平均F1
模型	SMAP	MSL	SWaT	WADI	最优平均F1
RTGAN	0.861	0.927	0.853	0.617	0.815
Dense-AE	0.729	0.483	0.824	0.354	0.598
IF	0.473	0.612	0.738	0.315	0.535
USAD	0.817	0.911	0.846	0.430	0.751
DAGMM	0.764	0.852	0.797	0.201	0.654
LSTM-VAE	0.684	0.579	0.804	0.380	0.612
MAD-GAN	0.381	0.124	0.810	0.624	0.485
OmniAnomaly	0.853	0.901	0.833	0.406	0.748

模型	单变量时间序列		多变量时间序列		最优平均F1
模型	SMAP	MSL	SWaT	WADI	最优平均F1
RTGAN	0.861	0.927	0.853	0.617	0.815
Dense-AE	0.729	0.483	0.824	0.354	0.598
IF	0.473	0.612	0.738	0.315	0.535
USAD	0.817	0.911	0.846	0.430	0.751
DAGMM	0.764	0.852	0.797	0.201	0.654
LSTM-VAE	0.684	0.579	0.804	0.380	0.612
MAD-GAN	0.381	0.124	0.810	0.624	0.485
OmniAnomaly	0.853	0.901	0.833	0.406	0.748

SNR/%	异常检测方法
SNR/%	LSTM-VAE	USAD	OmniAnomaly	RTGAN
10	0.591	0.746	0.731	0.803
20	0.557	0.725	0.705	0.794
30	0.497	0.695	0.711	0.789
40	0.468	0.676	0.691	0.763