《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 804-811.DOI: 10.11772/j.issn.1001-9081.2022010006

• 网络空间安全 • 上一篇    下一篇

基于再编码的无监督时间序列异常检测模型

尹春勇(), 周立文   

  1. 南京信息工程大学 计算机与软件学院,南京 210044
  • 收稿日期:2022-01-06 修回日期:2022-04-28 接受日期:2022-04-29 发布日期:2022-05-07 出版日期:2023-03-10
  • 通讯作者: 尹春勇
  • 作者简介:尹春勇(1977—),男,山东潍坊人,教授,博士生导师,博士,主要研究方向:网络空间安全、大数据挖掘、隐私保护、人工智能、新型计算
    周立文(1996—),男,江苏沭阳人,硕士研究生,主要研究方向:异常检测、深度学习、大数据挖掘、对抗攻击。

Unsupervised time series anomaly detection model based on re-encoding

Chunyong YIN(), Liwen ZHOU   

  1. School of Computer and Software,Nanjing University of Information Science and Technology,Nanjing Jiangsu 210044,China
  • Received:2022-01-06 Revised:2022-04-28 Accepted:2022-04-29 Online:2022-05-07 Published:2023-03-10
  • Contact: Chunyong YIN
  • About author:ZHOU Liwen, born in 1996, M. S. candidate. His researchinterests include anomaly detection, deep learning, big data mining,adversarial attack.

摘要:

针对时间序列的数据不平衡和高度复杂的时间相关性导致的异常检测准确率低的问题,以生成对抗网络(GAN)作为基础提出一种基于再编码的无监督时间序列异常检测模型RTGAN。首先,使用具有周期一致性的多个生成器保证生成样本的多样性,从而学习不同的异常模式;其次,使用堆叠式LSTM-dropout RNN捕获时间相关性;然后,使用二次编码在潜在空间中比较生成样本和真实样本之间的差异,并将此差异作为再编码误差当作异常分数的一部分,从而提高异常检测的准确率;最后,使用新的异常分数对单变量和多变量时间序列数据集进行异常检测。将所提模型与七种基线异常检测模型在单变量和多变量时间序列上进行了比较。实验结果表明,所提模型在所有数据集上均获得了最高的平均F1值(0.815),并且总体性能分别比原始自编码器(AE)模型Dense-AE和最新的基准模型USAD高出36.29%和8.52%。通过不同的信噪比(SNR)检测模型的健壮性,结果表明所提模型一直优于LSTM-VAE、USAD和OmniAnomaly,尤其在SNR为30%情况下,RTGAN的F1值分别比USAD和OmniAnomaly高出13.53%和10.97%。可见所提模型能有效提高异常检测的准确率和鲁棒性。

关键词: 生成对抗网络, 异常检测, 时间序列, 堆叠式长短期记忆网络, 自编码器, 再编码

Abstract:

In order to deal with the problem of low accuracy of anomaly detection caused by data imbalance and highly complex temporal correlation of time series, a re-encoding based unsupervised time series anomaly detection model based on Generative Adversarial Network (GAN), named RTGAN (Re-encoding Time series based on GAN), was proposed. Firstly, multiple generators with cycle consistency were used to ensure the diversity of generated samples and thereby learning different anomaly patterns. Secondly, the stacked Long Short-Term Memory-dropout Recurrent Neural Network (LSTM-dropout RNN) was used to capture temporal correlation. Thirdly, the differences between the generated samples and the real samples were compared in the latent space by improved re-encoding. As the re-encoding errors, these differences were served as a part of anomaly score to improve the accuracy of anomaly detection. Finally, the new anomaly score was used to detect anomalies on univariate and multivariate time series datasets. The proposed model was compared with seven baseline anomaly detection models on univariate and multivariate time series. Experimental results show that the proposed model obtains the highest average F1-score (0.815) on all datasets. And the overall performance of the proposed model is 36.29% and 8.52% respectively higher than those of the original AutoEncoder (AE) model Dense-AE (Dense-AutoEncoder) and latest benchmark model USAD (UnSupervised Anomaly Detection on multivariate time series). The robustness of the model was detected by different Signal-to-Noise Ratio (SNR). The results show that the proposed model consistently outperforms LSTM-VAE (Variational Autoencoder based on LSTM), USAD and OmniAnomaly, especially in the case of 30% SNR, the F1-score of RTGAN is 13.53% and 10.97% respectively higher than those of USAD and OmniAnomaly. It can be seen that RTGAN can effectively improve the accuracy and robustness of anomaly detection.

Key words: Generative Adversarial Network (GAN), anomaly detection, time series, stacked Long Short-Term Memory (LSTM) network, AutoEncoder (AE), re-encoding

中图分类号: