基于再编码的无监督时间序列异常检测模型

doi:10.11772/j.issn.1001-9081.2022010006

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 804-811.DOI: 10.11772/j.issn.1001-9081.2022010006

基于再编码的无监督时间序列异常检测模型

尹春勇(), 周立文

南京信息工程大学计算机与软件学院，南京 210044

收稿日期:2022-01-06 修回日期:2022-04-28 接受日期:2022-04-29 发布日期:2022-05-07 出版日期:2023-03-10
通讯作者: 尹春勇
作者简介:尹春勇（1977—），男，山东潍坊人，教授，博士生导师，博士，主要研究方向：网络空间安全、大数据挖掘、隐私保护、人工智能、新型计算
周立文（1996—），男，江苏沭阳人，硕士研究生，主要研究方向：异常检测、深度学习、大数据挖掘、对抗攻击。

Unsupervised time series anomaly detection model based on re-encoding

Chunyong YIN(), Liwen ZHOU

School of Computer and Software，Nanjing University of Information Science and Technology，Nanjing Jiangsu 210044，China

Received:2022-01-06 Revised:2022-04-28 Accepted:2022-04-29 Online:2022-05-07 Published:2023-03-10
Contact: Chunyong YIN
About author:ZHOU Liwen， born in 1996， M. S. candidate. His researchinterests include anomaly detection， deep learning， big data mining，adversarial attack.

摘要/Abstract

摘要：

针对时间序列的数据不平衡和高度复杂的时间相关性导致的异常检测准确率低的问题，以生成对抗网络（GAN）作为基础提出一种基于再编码的无监督时间序列异常检测模型RTGAN。首先，使用具有周期一致性的多个生成器保证生成样本的多样性，从而学习不同的异常模式；其次，使用堆叠式LSTM-dropout RNN捕获时间相关性；然后，使用二次编码在潜在空间中比较生成样本和真实样本之间的差异，并将此差异作为再编码误差当作异常分数的一部分，从而提高异常检测的准确率；最后，使用新的异常分数对单变量和多变量时间序列数据集进行异常检测。将所提模型与七种基线异常检测模型在单变量和多变量时间序列上进行了比较。实验结果表明，所提模型在所有数据集上均获得了最高的平均F1值（0.815），并且总体性能分别比原始自编码器（AE）模型Dense-AE和最新的基准模型USAD高出36.29%和8.52%。通过不同的信噪比（SNR）检测模型的健壮性，结果表明所提模型一直优于LSTM-VAE、USAD和OmniAnomaly，尤其在SNR为30%情况下，RTGAN的F1值分别比USAD和OmniAnomaly高出13.53%和10.97%。可见所提模型能有效提高异常检测的准确率和鲁棒性。

关键词: 生成对抗网络, 异常检测, 时间序列, 堆叠式长短期记忆网络, 自编码器, 再编码

Abstract:

In order to deal with the problem of low accuracy of anomaly detection caused by data imbalance and highly complex temporal correlation of time series， a re-encoding based unsupervised time series anomaly detection model based on Generative Adversarial Network （GAN）， named RTGAN （Re-encoding Time series based on GAN）， was proposed. Firstly， multiple generators with cycle consistency were used to ensure the diversity of generated samples and thereby learning different anomaly patterns. Secondly， the stacked Long Short-Term Memory-dropout Recurrent Neural Network （LSTM-dropout RNN） was used to capture temporal correlation. Thirdly， the differences between the generated samples and the real samples were compared in the latent space by improved re-encoding. As the re-encoding errors， these differences were served as a part of anomaly score to improve the accuracy of anomaly detection. Finally， the new anomaly score was used to detect anomalies on univariate and multivariate time series datasets. The proposed model was compared with seven baseline anomaly detection models on univariate and multivariate time series. Experimental results show that the proposed model obtains the highest average F1-score （0.815） on all datasets. And the overall performance of the proposed model is 36.29% and 8.52% respectively higher than those of the original AutoEncoder （AE） model Dense-AE （Dense-AutoEncoder） and latest benchmark model USAD （UnSupervised Anomaly Detection on multivariate time series）. The robustness of the model was detected by different Signal-to-Noise Ratio （SNR）. The results show that the proposed model consistently outperforms LSTM-VAE （Variational Autoencoder based on LSTM）， USAD and OmniAnomaly， especially in the case of 30% SNR， the F1-score of RTGAN is 13.53% and 10.97% respectively higher than those of USAD and OmniAnomaly. It can be seen that RTGAN can effectively improve the accuracy and robustness of anomaly detection.

Key words: Generative Adversarial Network (GAN), anomaly detection, time series, stacked Long Short-Term Memory (LSTM) network, AutoEncoder (AE), re-encoding

中图分类号:

TP391.1

尹春勇, 周立文. 基于再编码的无监督时间序列异常检测模型[J]. 计算机应用, 2023, 43(3): 804-811.

Chunyong YIN, Liwen ZHOU. Unsupervised time series anomaly detection model based on re-encoding[J]. Journal of Computer Applications, 2023, 43(3): 804-811.

图/表 9

图1 RTGAN模型的框架

Fig. 1 Framework of RTGAN model

图2 堆叠式LSTM-dropout RNN框架

Fig. 2 Framework of stacked LSTM-dropout RNN

表1 四个数据集的统计信息

Tab. 1 Statistics of four datasets

数据集	样本总数	样本数分类		异常比率/%	特征数
数据集	样本总数	训练集	测试集	异常比率/%	特征数
SWaT	946 719	568 031	378 688	11.98	51
WADI	577 658	346 594	231 064	5.99	127
SMAP	562 800	337 680	225 120	13.13	25
MSL	132 046	79 227	52 819	10.72	55

表2 异常检测的F1值对比

Tab. 2 Comparison of F1-scores for anomaly detection

模型	单变量时间序列		多变量时间序列		最优平均F1
模型	SMAP	MSL	SWaT	WADI	最优平均F1
RTGAN	0.861	0.927	0.853	0.617	0.815
Dense-AE	0.729	0.483	0.824	0.354	0.598
IF	0.473	0.612	0.738	0.315	0.535
USAD	0.817	0.911	0.846	0.430	0.751
DAGMM	0.764	0.852	0.797	0.201	0.654
LSTM-VAE	0.684	0.579	0.804	0.380	0.612
MAD-GAN	0.381	0.124	0.810	0.624	0.485
OmniAnomaly	0.853	0.901	0.833	0.406	0.748

图3 4个数据集组成的样本上的异常检测情况

Fig. 3 Anomaly detection on samples consisting of four datasets

图4 NASA数据集上的收敛结果

Fig. 4 Convergence results on NASA datasets

表3 在原始数据中加入不同的SNR噪声时的平均F1值

Tab. 3 Average F1-scores when adding noise at different SNR to original data

SNR/%	异常检测方法
SNR/%	LSTM-VAE	USAD	OmniAnomaly	RTGAN
10	0.591	0.746	0.731	0.803
20	0.557	0.725	0.705	0.794
30	0.497	0.695	0.711	0.789
40	0.468	0.676	0.691	0.763

图5 生成器数量为1~6时，各个变种在所有基准数据集上的总体F1值

Fig. 5 Overall F1-scores of each variant on all benchmark datasets when the number of generators is from 1 to 6

图6 结合方法超越仅使用再编码器、堆叠式LSTM-dropout RNN方法F1的比率

Fig. 6 Performance improvement ratio of combined model versus only using re-encoder， stacked LSTM-dropout RNN respectively

参考文献 26

1	CHOI K， YI J， PARK C， et al. Deep learning for anomaly detection in time-series data： review， analysis， and guidelines［J］. IEEE Access， 2021， 9： 120043-120065. 10.1109/ACCESS.2021.3107975
2	YANG J F， SUN Y， LIANG J， et al. Image captioning by incorporating affective concepts learned from both visual and textual components［J］. Neurocomputing， 2019， 328： 56-68. 10.1016/j.neucom.2018.03.078
3	LI X X， KANG Y F， LI F. Forecasting with time series imaging［J］. Expert Systems with Applications， 2020， 160： No.113680. 10.1016/j.eswa.2020.113680
4	LI D， CHEN D C， JIN B H， et al. MAD-GAN： multivariate anomaly detection for time series data with generative adversarial networks［C］// Proceedings of the 2019 International Conference on Artificial Neural Networks， LNCS 11730. Cham： Springer， 2019： 703-716.
5	AUDIBERT J， MICHIARDI P， GUYARD F， et al. USAD： unsupervised anomaly detection on multivariate time series［C］// Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2020： 3395-3404. 10.1145/3394486.3403392
6	FAN C， XIAO F， ZHAO Y， et al. Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data［J］. Applied Energy， 2018， 211： 1123-1135. 10.1016/j.apenergy.2017.12.005
7	YAO R， LIU C D， ZHANG L X， et al. Unsupervised anomaly detection using variational auto-encoder based feature extraction［C］// Proceedings of the 2019 IEEE International Conference on Prognostics and Health Management. Piscataway： IEEE， 2019： 1-7. 10.1109/icphm.2019.8819434
8	SCHLEGL T， SEEBÖCK P， WALDSTEIN S M， et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery［C］// Proceedings of the 2017 International Conference on Information Processing in Medical Imaging， LNCS 10265. Cham： Springer， 2017： 146-157.
9	DENDORFER P， ELFLEIN S， LEAL-TAIXÉ L. MG-GAN： a multi-generator model preventing out-of-distribution samples in pedestrian trajectory prediction［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 13138-13147. 10.1109/iccv48922.2021.01291
10	KIEU T， YANG B， JENSEN C S. Outlier detection for multidimensional time series using deep neural networks［C］// Proceedings of the 19th IEEE International Conference on Mobile Data Management. Piscataway： IEEE， 2018： 125-134. 10.1109/mdm.2018.00029
11	REN H S， XU B X， WANG Y J， et al. Time-series anomaly detection service at Microsoft［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 3009-3017. 10.1145/3292500.3330680
12	COOK A A， MISIRLI G， FAN Z. Anomaly detection for IoT time-series data： a survey［J］. IEEE Internet of Things Journal， 2020， 7（7）： 6481-6494. 10.1109/jiot.2019.2958185
13	RAMASWAMY S， RASTOGI R， SHIM K， et al. Efficient algorithms for mining outliers from large data sets［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2000： 427-438. 10.1145/335191.335437
14	BREUNIG M M， KRIEGEL H P， NG R T， et al. LOF： identifying density-based local outliers［C］// Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York： ACM， 2000： 93-104. 10.1145/335191.335388
15	ZARE MOAYEDI H， MASNADI-SHIRAZI M A. ARIMA model for network traffic prediction and anomaly detection［C］// Proceedings of the 2008 International Symposium on Information Technology. Piscataway： IEEE， 2008： 1-6. 10.1109/itsim.2008.4631947
16	HE Q P， QIN S J， WANG J. A new fault diagnosis method using fault directions in Fisher discriminant analysis［J］. AIChE Journal， 2005， 51（2）： 555-571. 10.1002/aic.10325
17	AHMAD S， LAVIN A， PURDY S， et al. Unsupervised real-time anomaly detection for streaming data［J］. Neurocomputing， 2017， 262： 134-147. 10.1016/j.neucom.2017.04.070
18	RINGBERG H， SOULE A， REXFORD J， et al. Sensitivity of PCA for traffic anomaly detection［C］// Proceedings of the 2017 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York： ACM， 2007： 109-120. 10.1145/1254882.1254895
19	DAI L， LIN T， LIU C， et al. SDFVAE： static and dynamic factorized VAE for anomaly detection of multivariate CDN KPIs［C］// Proceedings of the 2021 World Wide Web Conference. New York： ACM， 2021： 3076-3086. 10.1145/3442381.3450013
20	霍纬纲，王慧芳. 基于自编码器和隐马尔可夫模型的时间序列异常检测方法［J］. 计算机应用， 2020， 40（5）： 1329-1334.
	HUO W G， WANG H F. Time series anomaly detection method based on autoencoder and HMM［J］. Journal of Computer Applications， 2020， 40（5）： 1329-1334.
21	VON SCHLEINITZ J， GRAF M， TRUTSCHNIG W， et al. VASP： an autoencoder-based approach for multivariate anomaly detection and robust time series prediction with application in motorsport［J］. Engineering Applications of Artificial Intelligence， 2021， 104： No.104354. 10.1016/j.engappai.2021.104354
22	GOODFELLOW I J， POUGET-ABADIE J， MIRZA M， et al. Generative adversarial nets［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. Cambridge： MIT Press， 2014： 2672-2680.
23	YOOH J， JARRETT D， M VAN DER SCHAAR. Time-series generative adversarial networks［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2021-09-21］..
24	王静，邹慧敏，曲东东，等. 基于经验模态分解生成对抗网络的金融时间序列预测［J］. 计算机应用与软件， 2020， 37（5）： 293-297. 10.3969/j.issn.1000-386x.2020.05.050
	WANG J， ZOU H M， QU D D， et al. Financial time series prediction based on empirical mode decomposition to generate adversarial networks［J］. Computer Applications and Software， 2020， 37（5）： 293-297. 10.3969/j.issn.1000-386x.2020.05.050
25	GULRAJANI I， AHEMD F， ARJOVSKY M， et al. Improved training of Wasserstein GANs［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 5769-5779.
26	SU Y， ZHAO Y J， NIU C H， et al. Robust anomaly detection for multivariate time series through stochastic recurrent neural network［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 2828-2837.

[1]	陈容均, 严宣辉, 杨超城. 面向时间序列的混合图像化循环胶囊分类网络[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 692-699.
[2]	陶玲玲, 刘波, 李文博, 何希平. 有闭解的可控人脸编辑算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 601-607.
[3]	王若莹, 吕凡, 赵柳清, 胡伏原. 融合用户需求和边界约束的平面图生成算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 575-582.
[4]	陈刚, 廖永为, 杨振国, 刘文印. 基于多特征融合的多尺度生成对抗网络图像修复算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 536-544.
[5]	徐少康, 张战成, 姚浩男, 邹智伟, 张宝成. 基于姿态编码器的2D/3D脊椎医学图像实时配准方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 589-594.
[6]	马志峰, 于俊洋, 王龙葛. 多样性表示的深度子空间聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 407-412.
[7]	朱利安, 张鸿. 基于双分支条件生成对抗网络的非均匀图像去雾[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 567-574.
[8]	贾晴, 王来花, 王伟胜. 基于独立循环神经网络与变分自编码网络的视频帧异常检测[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 507-513.
[9]	刘拥民, 杨钰津, 罗皓懿, 黄浩, 谢铁强. 基于双向循环生成对抗网络的无线传感网入侵检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 160-168.
[10]	林洋平, 刘佳, 陈培, 张明书, 杨晓元. 基于深度卷积生成对抗网络的半生成式视频隐写方案[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 169-175.
[11]	胡紫琪, 谢凯, 文畅, 李美然, 贺建飚. 生成对抗网络下的低剂量CT图像增强[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 280-288.
[12]	袁立宁, 刘钊. 基于One-Shot聚合自编码器的图表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 8-14.
[13]	周佳航, 邢红杰. 基于双自编码器和Transformer网络的异常检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 22-29.
[14]	张文涛, 王园宇, 李赛泽. 基于条件对抗网络的单幅霾图像深度估计模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2865-2875.
[15]	强赞霞, 鲍先富. 基于卷积长短期记忆的残差注意力去雨网络[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2858-2864.

基于再编码的无监督时间序列异常检测模型

Unsupervised time series anomaly detection model based on re-encoding

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics