计算机应用 ›› 2020, Vol. 40 ›› Issue (5): 1329-1334.DOI: 10.11772/j.issn.1001-9081.2019091631

• 数据科学与技术 • 上一篇    下一篇

基于自编码器和隐马尔可夫模型的时间序列异常检测方法

霍纬纲, 王慧芳   

  1. 中国民航大学 计算机科学与技术学院,天津 300300
  • 收稿日期:2019-09-24 修回日期:2019-10-19 出版日期:2020-05-10 发布日期:2020-05-15
  • 通讯作者: 霍纬纲(1978—)
  • 作者简介:霍纬纲(1978—),男,山西洪洞人,副教授,博士,CCF会员,主要研究方向:数据挖掘、模糊分类; 王慧芳(1993—),女,山西大同人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:

    国家自然科学基金委员会-中国民用航空局民航联合研究基金资助项目(U1633110);中央高校基本科研业务费项目中国民航大学专项(3122019190)。

Time series anomaly detection method based on autoencoder and HMM

HUO Weigang, WANG Huifang   

  1. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2019-09-24 Revised:2019-10-19 Online:2020-05-10 Published:2020-05-15
  • Contact: HUO Weigang, born in 1978, Ph. D., associate professor. His research interests include data mining, fuzzy classification.
  • About author:HUO Weigang, born in 1978, Ph. D., associate professor. His research interests include data mining, fuzzy classification.WANG Huifang, born in 1993, M. S. candidate. Her research interests include data mining.
  • Supported by:

    This work is partially supported by the Civil Aviation Joint Research Fund of Committee of National Natural Science Foundation of China and Civil Aviation Administration of China (U1633110), the Special Fund for Civil Aviation University of China of Fundamental Research Funds for the Central Universities (3122019190).

摘要:

针对已有基于隐马尔可夫模型(HMM)的时间序列异常检测模型的符号化方法不能很好地表征原始时间序列的问题,提出了一种基于自编码器和HMM的时间序列异常检测方法(AHMM-AD)。首先,通过滑动窗口对时间序列样本进行分段,按照分段位置形成若干时间序列分段样本集,由正常时间序列上不同位置的分段样本集训练各个分段的自编码器;然后,利用自编码器得到每个分段时间序列样本的低维特征表示,通过对低维特征表示向量集的K-means聚类处理,实现时间序列样本集的符号化;最后,由正常时间序列的符号序列集生成HMM,根据待测样本在已建HMM上的输出概率值进行异常检测。在多个公共基准数据集上的实验结果显示,AHMM-AD比已有的基于HMM的时间序列异常检测模型在精确度、召回率和F1值分别平均提高了0.172、0.477、0.313,比基于autoencoder的时间序列异常检测模型,在这三方面分别平均提高了0.108、0.450、0.319。实验结果表明,AHMM-AD方法能够提取时间序列中的非线性特征,解决已有HMM建模时间序列符号化过程中不能很好表征时间序列的问题,并在时间序列异常检测性能上也有显著提升。

关键词: 自编码器, 符号化序列, 隐马尔可夫模型, 异常检测, 时间序列

Abstract:

To solve the issue that the existing symbolic methods of anomaly detection based on Hidden Markov Model (HMM) cannot well represent the original time series, an Autoencoder and HMM-based Anomaly Detection (AHMM-AD) method was proposed. Firstly, the time series samples were segmented by sliding window, and several time series segmented sample sets were formed according to the positions of the segmentations, and the autoencoder of each segmentation was trained by the segmented sample set of different positions on the normal time series. Then, the low-dimensional feature representation of each segmented time series sample was obtained by using the autoencoder, and through K-means clustering of low-dimensional feature representation vector sets, the symbolization of time series sample sets was realized. Finally, the HMM was generated based on the symbol sequence set of the normal time series, and the abnormal detection was carried out according to the output probability values of the test samples on the established HMM. The experimental results on multiple common benchmark datasets show that AHMM-AD improves the accuracy, recall rate, and F1 value by 0.172, 0.477 and 0.313 respectively compared to those of the HMM-based time series anomaly detection model, and has 0.108, 0.450 and 0.319 increasement in these three aspects respectively compared with the autoencoder-based time series anomaly detection model. The experimental results illustrate that AHMM-AD method can extract the nonlinear features in time series, solve the problem that the time series cannot be well represented during the symbolization process of existing HMM-based time series modeling, and also improve the performance of time series anomaly detection.

Key words: autoencoder, symbol sequence, Hidden Markov Model (HMM), anomaly detection, time series

中图分类号: