计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2960-2966.DOI: 10.11772/j.issn.1001-9081.2020020270

• 网络空间安全 • 上一篇    下一篇

基于生成对抗网络的系统日志级异常检测算法

夏彬1,2, 白宇轩1, 殷俊杰3   

  1. 1. 南京邮电大学 计算机学院、软件学院、网络空间安全学院, 南京 210023;
    2. 江苏省大数据安全与智能处理重点实验室(南京邮电大学), 南京 210023;
    3. 中兴通讯股份有限公司, 南京 210012
  • 收稿日期:2020-03-12 修回日期:2020-04-18 出版日期:2020-10-10 发布日期:2020-05-15
  • 通讯作者: 殷俊杰
  • 作者简介:夏彬(1989-),男,江苏南京人,讲师,博士,CCF会员,主要研究方向:智能运维、推荐系统、深度学习;白宇轩(1995-),男,山西晋中人,硕士研究生,主要研究方向:深度学习、自然语言处理;殷俊杰(1994-),男,江苏人南京人,高级工程师,硕士,主要研究方向:推荐系统、深度学习。
  • 基金资助:
    国家自然科学基金资助项目(61802205);江苏省高校自然科学研究项目(18KJB520037)。

Generative adversarial network-based system log-level anomaly detection algorithm

XIA Bin1,2, BAI Yuxuan1, YIN Junjie3   

  1. 1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China;
    2. Jiangsu Key Laboratory of Big Data Security&Intelligent Processing;(Nanjing University of Posts and Telecommunications), Nanjing Jiangsu 210023, China;
    3. Zhongxing Telecommunication Equipment Corporation, Nanjing Jiangsu 210012, China
  • Received:2020-03-12 Revised:2020-04-18 Online:2020-10-10 Published:2020-05-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61802205), the Natural Science Foundation of Jiangsu Higher Education Institutions (18KJB520037).

摘要: 针对大规模软件系统自动化异常检测任务中异常样本过少且异常反馈不及时的问题,提出一种基于生成对抗网络(GAN)与注意力机制的日志级异常检测算法。首先,通过日志模板将非结构化的日志转化为结构化的事件,每一个事件包含了日志的时间戳、签名与变量。其次,以滑动窗口的方式划分解析的事件序列,将产生的事件模式与下一时刻的事件组成真实的数据样本集。然后,将真实的事件模式作为训练样本输入来训练基于注意力机制的生成对抗网络,通过对抗学习的机制训练基于循环神经网络(RNN)的生成器直至收敛。最后,生成器通过输入的流式事件模式生成在新到来的事件模式下的正常与异常事件分布,并在系统管理员设置阈值的情况下,自动判断下一时刻的特定日志为正常事件或是异常事件。实验结果表明,提出的以门控循环单元网络为注意力权重并且用长短时记忆(LSTM)网络来解析事件模式的异常检测算法,比仅使用门控循环单元网络时的算法精准率提高了21.7%;此外,与日志级异常检测算法LogGAN相比,所提算法比LogGAN的异常检测精准率提升了7.8%。

关键词: 异常检测, 生成对抗网络, 注意力机制, 循环神经网络, 智能运维

Abstract: To solve the problems of small number of anomaly samples and inefficient feedback of anomalies in the anomaly detection tasks of large-scale software system, a log-level anomaly detection algorithm based on Generative Adversarial Network (GAN) and attention mechanism. First, the unstructured logs were converted into structured events through the log templates, and each event included timestamps, signature and parameters. Second, through sliding window method, the sequence of the parsed events were divided into patterns, and the real training dataset was comprised combination of the divided event patterns and the corresponding following events. Third, the real event patterns were used as the training samples to train the attention mechanism-based GAN, and the Recurrent Neural Network (RNN) based generator was trained through the adversarial learning mechanism until it converged. Finally, through the input flow event pattern, the generator generated the possibility distribution of normal and abnormal events based on the previous pattern. When the threshold was set, whether the specific log of next moment is a normal event or an abnormal event was determined automatically. Experimental results show that the proposed anomaly detection algorithm, which uses a gated recurrent unit network as the attention weight and a Long Short-Term Memory (LSTM) network to fit event patterns, has a 21.7% increase in precision compared to the algorithm only using the gated recurrent unit network. In addition, compared to the log-level anomaly detection algorithm LogGAN, the proposed algorithm improves the precision of anomaly detection by 7.8% over the performance of LogGAN.

Key words: anomaly detection, Generative Adversarial Network (GAN), attention mechanism, Recurrent Neural Network (RNN), artificial intelligence for IT operations

中图分类号: