计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3204-3210.DOI: 10.11772/j.issn.1001-9081.2018041252

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

基于频繁模式发现的时间序列异常检测方法

李海林, 邬先利   

  1. 华侨大学 信息管理系, 福建 泉州 362021
  • 收稿日期:2018-05-10 修回日期:2018-06-04 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 邬先利
  • 作者简介:李海林(1982-),男,福建龙岩人,副教授,博士,主要研究方向:数据挖掘、决策支持;邬先利(1994-),女,贵州遵义人,硕士研究生,主要研究方向:数据挖掘、期刊数据分析。
  • 基金资助:
    国家自然科学基金资助项目(71771094,61300139);福建省社会科学规划项目(FJ2017B065);福建省高等学校新世纪优秀人才支持计划项目(Z1625112)。

Time series anomaly detection method based on frequent pattern discovery

LI Hailin, WU Xianli   

  1. Department of Information Management, Huaqiao University, Quanzhou Fujian 362021, China
  • Received:2018-05-10 Revised:2018-06-04 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71771094,61300139), the Social Science Planning Program of Fujian Province (FJ2017B065), the Fujian Provincial Colleges and Universities Support Program for New Century Talents (Z1625112).

摘要: 针对传统异常片段检测方法在处理增量式时间序列时效率低的问题,提出一种基于频繁模式发现的时间序列异常检测(TSAD)方法。首先,将历史输入的时间序列数据进行符号转化;其次,利用符号化特征找出历史序列数据集中的频繁模式;最后,结合最长公共子序列匹配方法度量频繁模式与当前新增加时间序列数据之间的相似度,从而发现新增加数据中的异常模式。与基于滑动窗口预测的水文时间序列异常检测方法(TSOD)和基于扩展符号聚集近似的水文时间序列异常挖掘方法(ESAA)相比,对于实验选择的三种类型的时间序列数据,TSAD的检测率都超过90%;TSOD对规则性较强的序列检测率较高,能达到99%,但对噪声干扰较大的序列检测率较低,对数据偏向性较强;ESAA对三种类型的数据检测率均不超过70%。实验结果表明,TSAD在时间序列异常检测中能够较好地发现异常片段。

关键词: 时间序列, 符号集合近似, 频繁模式, 异常检测, 检测率

Abstract: Aiming at the low efficiency of traditional anomaly detection methods in processing incremental time series, an Time Series Anomaly Detection method based on frequent pattern discovery (TSAD) was proposed. Firstly, the historical input time series data were transformed into symbols. Secondly, the frequent patterns of historical sequence data sets were found by symbolic features. Finally, the similarity between the frequent pattern and the current new time series data was measured with the longest common subsequence matching method, the abnormal patterns in the newly added data were found. Compared with Time Series Outlier Detection based on sliding window prediction (TSOD) and Extended Symbolic Aggregate Approximation based anomaly mining of hydrological time series (ESAA), the detection rate of TSAD is more than 90% for the three types of time series data selected by the experiment. TSOD has a higher detection rate for more regular sequences, and can reach 99%. But the detection rate of noisy sequences is lower, and the data bias is stronger; and the data detection rate of three types of ESAA is not more than 70%. The experimental results show that TSAD can detect abnormal patterns of time series well.

Key words: time series, Symbolic Aggregation Approximation (SAX), frequent pattern, anomaly detection, detection rate

中图分类号: