Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (5): 1335-1339.DOI: 10.11772/j.issn.1001-9081.2019111970

• Data science and technology • Previous Articles     Next Articles

Stream data anomaly detection method based on long short-term memory network and sliding window

QIU Yuan1, Chang Xiangmao1, QIU Qian2, PENG Cheng1, SU Shanting1   

  1. 1.College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, NanjingJiangsu 211106, China
    2.College of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China
  • Received:2019-11-04 Revised:2019-11-25 Online:2020-05-10 Published:2020-05-15
  • Contact: CHANG Xiangmao, born in 1982, Ph. D., associate professor. His research interests include internet of things, intelligent health monitoring based on wearable devices, sensory data processing and analysis of machine learning algorithms.
  • About author:QIU Yuan, born in 1995, M. S. candidate. Her research interests include anomaly detection, deep learning.CHANG Xiangmao, born in 1982, Ph. D., associate professor. His research interests include internet of things, intelligent health monitoring based on wearable devices, sensory data processing and analysis of machine learning algorithms.QIU Qian, born in 1995, M. S. candidate. Her research interests include social network, deep learning.PENG Cheng, born in 1995, M. S. candidate. His research interests include state detection, deep learning.SU Shanting, born in 1994, M. S. candidate. Her research interests include fault detection, machine learning.

基于长短期记忆网络和滑动窗口的流数据异常检测方法

仇媛1, 常相茂1, 仇倩2, 彭程1, 苏善婷1   

  1. 1.南京航空航天大学 计算机科学与技术学院,南京 211106
    2.河北工业大学 人工智能与数据科学学院,天津 300401
  • 通讯作者: 常相茂(1982—)
  • 作者简介:仇媛(1995—),女,河北石家庄人,硕士研究生,CCF会员,主要研究方向:异常检测、深度学习; 常相茂(1982—),男,山东淄博人,副教授, 博士,主要研究方向:物联网、基于可穿戴设备的智能健康监测、机器学习算法的感知数据处理及分析; 仇倩(1995—),女,河北石家庄人,硕士研究生,主要研究方向:社会网络、深度学习; 彭程(1995—),男,安徽合肥人,硕士研究生,CCF会员,主要研究方向:状态检测、深度学习; 苏善婷(1994—),女,江苏常熟人,硕士研究生,CCF会员,主要研究方向:故障检测、机器学习。

Abstract:

Aiming at the characteristics of large volume, rapid generation and concept drift of current stream data, a stream data anomaly detection method based on Long Short-Term Memory (LSTM) network and sliding window was proposed. Firstly, the LSTM network was used for data prediction, and the difference between the predicted value and the actual value was calculated. For each datum, the appropriate sliding window was selected, and the distribution modeling was performed to all the differences in the sliding window interval, then the probability of data anomaly was calculated according to the probability density of each difference in the current distribution. The LSTM network was not only able to predict data, but also able to predict and learn at the same time, as well as update and adjust the network in real time to ensure the validity of the model. The use of sliding windows was able to make the allocation of abnormal scores more reasonable. Finally, the simulation data made on the basis of real data were used for experiment. The experimental results verify that the average Area Under Curve (AUC) value of the proposed method in low-noise environment is 0.187 and 0.05 higher than that of direct difference detection and Abnormal data Distribution Modeling (ADM) method, respectively.

Key words: stream data, anomaly detection, sliding window, Long Short-Term Memory (LSTM) network, neural network

摘要:

针对目前流数据存在数量巨大、生成迅速和概念漂移的特点,提出了一种基于长短期记忆(LSTM)网络和滑动窗口的流数据异常检测方法。首先采用LSTM网络进行数据预测,之后计算预测值与实际值的差值。对于每个数据,选择合适的滑动窗口,将滑动窗口区间内的所有差值进行分布建模,再根据每个差值在当前分布的概率密度来计算数据异常可能性。LSTM网络不仅可以进行数据预测,还可以边预测边学习,实时更新调整网络,保证模型的有效性;而利用滑动窗口可以使得异常分数的分配更为合理。最后使用在真实数据基础上制造的模拟数据进行了实验。实验结果验证了所提方法在低噪声环境下比直接利用差值进行检测和异常数据分布建模法(ADM)方法的平均曲线下面积(AUC)值分别提高了0.187和0.05。

关键词: 流数据, 异常检测, 滑动窗口, 长短期记忆网络, 神经网络

CLC Number: