计算机应用 ›› 2021, Vol. 41 ›› Issue (6): 1701-1708.DOI: 10.11772/j.issn.1001-9081.2020091383

所属专题: 网络空间安全

• 网络空间安全 • 上一篇    下一篇

云工作流中基于多任务时序卷积网络的异常检测方法

姚杰1, 程春玲1, 韩静2, 刘峥1   

  1. 1. 南京邮电大学 计算机学院, 南京 210023;
    2. 中兴通讯股份有限公司 上海研发中心, 上海 201203
  • 收稿日期:2020-09-07 修回日期:2020-12-17 出版日期:2021-06-10 发布日期:2020-12-29
  • 通讯作者: 刘峥
  • 作者简介:姚杰(1996-),男,江苏盐城人,硕士研究生,主要研究方向:日志分析、深度学习;程春玲(1972-),女,陕西西安人,教授,博士,主要研究方向:数据管理、资源管理和优化;韩静(1978-),女,上海人,高级工程师,硕士,主要研究方向:机器学习、事件挖掘;刘峥(1980-),男,江苏南京人,讲师,博士,CCF会员,主要研究方向:网络数据挖掘。
  • 基金资助:
    国家重点研发计划项目(2018YFB1003702);中兴通讯股份有限公司产学研合作项目;南京邮电大学国家自然科学基金孵化项目(NY219084)。

Anomaly detection method based on multi-task temporal convolutional network in cloud workflow

YAO Jie1, CHENG Chunling1, HAN Jing2, LIU Zheng1   

  1. 1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China;
    2. Shanghai Research and Development Center, Zhongxing Telecommunication Equipment Corporation, Shanghai 201203, China
  • Received:2020-09-07 Revised:2020-12-17 Online:2021-06-10 Published:2020-12-29
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2018YFB1003702), the Industry-University-Research Cooperation Project of Zhongxing Telecommunication Equipment Corporation, the Incubation Project of National Natural Science Foundation of China from Nanjing University of Posts and Telecommunications (NY219084).

摘要: 云计算数据中心在日常部署和运行过程中产生的大量日志可以帮助系统运维人员进行异常分析。路径异常和时延异常是云工作流中常见的异常。针对传统的异常检测方法分别对两种异常检测任务训练相应的学习模型,而忽略了两种异常检测任务之间的关联性,导致异常检测准确率下降的问题,提出了一种基于多任务时序卷积网络的日志异常检测方法。首先,基于日志流的事件模板,生成事件序列和时间序列;然后,训练基于多任务时序卷积网络的深度学习模型,该模型通过共享时序卷积网络中的浅层部分来从系统正常执行的流程中并行地学习事件和时间特征;最后,对云计算工作流中的异常进行分析,并设计了相关异常检测逻辑。在OpenStack数据集上的实验结果表明,与日志异常检测的领先算法DeepLog和基于主成分分析(PCA)的方法比较,所提方法的异常检测准确率至少提升了7.7个百分点。

关键词: 异常检测, 日志分析, 时序卷积网络, 多任务学习, 云工作流

Abstract: Numerous logs generated during the daily deployment and operation process in cloud computing platforms help system administrators perform anomaly detection. Common anomalies in cloud workflow include pathway anomalies and time delay anomalies. Traditional anomaly detection methods train the learning models corresponding to the two kinds of anomaly detection tasks respectively and ignore the correlation between these two tasks, which leads to the decline of the accuracy of anomaly detection. In order to solve the problems, an anomaly detection method based on multi-task temporal convolutional network was proposed. Firstly, the event sequence and time sequence were generated based on the event templates of log stream. Then, the deep learning model based on the multi-task temporal convolutional network was trained. In the model, the event and the time characteristics were learnt in parallel from the normal system execution processes by sharing the shallow layers of the temporal convolutional network. Finally, the anomalies in the cloud computing workflow were analyzed, and the related anomaly detection logic was designed. Experimental results on the OpenStack dataset demonstrate that, the proposed method improves the anomaly detection accuracy at least by 7.7 percentage points compared to the state-of-art log anomaly detection algorithm DeepLog and the method based on Principal Component Analysis (PCA).

Key words: anomaly detection, log analysis, temporal convolutional network, multi-task learning, cloud workflow

中图分类号: