Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3510-3516.DOI: 10.11772/j.issn.1001-9081.2022111738
• Cyber security • Previous Articles
Chunyong YIN(), Yangchun ZHANG
Received:
2022-11-22
Revised:
2023-03-19
Accepted:
2023-03-23
Online:
2023-04-10
Published:
2023-11-10
Contact:
Chunyong YIN
About author:
YIN Chunyong, born in 1977, Ph. D., professor. His research interests include cyberspace security, big data mining, privacy protection, artificial intelligence, new computing.通讯作者:
尹春勇
作者简介:
尹春勇(1977—),男,山东潍坊人,教授,博士生导师,博士,主要研究方向:网络空间安全、大数据挖掘、隐私保护、人工智能、新型计算 yinchunyong@hotmail.comCLC Number:
Chunyong YIN, Yangchun ZHANG. Unsupervised log anomaly detection model based on CNN and Bi-LSTM[J]. Journal of Computer Applications, 2023, 43(11): 3510-3516.
尹春勇, 张杨春. 基于CNN和Bi-LSTM的无监督日志异常检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3510-3516.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022111738
数据集 | 收集时间/d | 大小/GB | 总模板数 | 训练序列数 | 日志数 | 异常数 | 训练模板数 |
---|---|---|---|---|---|---|---|
HDFS | 2 | 1.490 | 30 | 5 000 | 1 175 629 | 16 838(blocks) | 15 |
BGL | 215 | 0.708 | 378 | 7 500 | 4 747 963 | 348 460(logs) | 185 |
Tab. 1 Statistics of two datasets
数据集 | 收集时间/d | 大小/GB | 总模板数 | 训练序列数 | 日志数 | 异常数 | 训练模板数 |
---|---|---|---|---|---|---|---|
HDFS | 2 | 1.490 | 30 | 5 000 | 1 175 629 | 16 838(blocks) | 15 |
BGL | 215 | 0.708 | 378 | 7 500 | 4 747 963 | 348 460(logs) | 185 |
模型 | 技术分类 | 检测方法 | 语义表示 |
---|---|---|---|
LogCluster | 无监督 | 聚类 | 无 |
ADR | 无监督 | IM | 无 |
OES | 有监督 | SVM | 有 |
DeepLog | 无监督 | LSTM | 无 |
LogAnomaly | 无监督 | LSTM | 有 |
LogBERT | 自监督 | BERT | 无 |
CNN-BiLSTM | 有监督 | CNN、BiLSTM | 有 |
LogRobust | 有监督 | Bi-LSTM、注意力机制 | 有 |
Tab. 2 Details of baseline models
模型 | 技术分类 | 检测方法 | 语义表示 |
---|---|---|---|
LogCluster | 无监督 | 聚类 | 无 |
ADR | 无监督 | IM | 无 |
OES | 有监督 | SVM | 有 |
DeepLog | 无监督 | LSTM | 无 |
LogAnomaly | 无监督 | LSTM | 有 |
LogBERT | 自监督 | BERT | 无 |
CNN-BiLSTM | 有监督 | CNN、BiLSTM | 有 |
LogRobust | 有监督 | Bi-LSTM、注意力机制 | 有 |
方法 | HDFS | BGL | ||||
---|---|---|---|---|---|---|
精确度 | 召回率 | F1-score | 精确度 | 召回率 | F1-score | |
LogCluster | 0.993 | 0.371 | 0.540 | 0.955 | 0.640 | 0.766 |
ADR | 0.931 | 0.929 | 0.925 | 0.937 | 1.000 | 0.967 |
OES | 0.978 | 0.974 | 0.976 | 0.932 | 0.981 | 0.956 |
DeepLog | 0.953 | 0.961 | 0.957 | 0.900 | 0.960 | 0.929 |
LogAnomaly | 0.960 | 0.940 | 0.950 | 0.970 | 0.940 | 0.960 |
LogBERT | 0.870 | 0.781 | 0.823 | 0.894 | 0.923 | 0.908 |
CNN-BiLSTM | 0.980 | 0.830 | 0.900 | 0.949 | 0.930 | 0.939 |
LogRobust | 0.980 | 1.000 | 0.999 | 0.912 | 0.964 | 0.937 |
LogCL | 0.986 | 0.987 | 0.986 | 0.973 | 0.993 | 0.983 |
Tab. 3 Experimental results on HDFS dataset
方法 | HDFS | BGL | ||||
---|---|---|---|---|---|---|
精确度 | 召回率 | F1-score | 精确度 | 召回率 | F1-score | |
LogCluster | 0.993 | 0.371 | 0.540 | 0.955 | 0.640 | 0.766 |
ADR | 0.931 | 0.929 | 0.925 | 0.937 | 1.000 | 0.967 |
OES | 0.978 | 0.974 | 0.976 | 0.932 | 0.981 | 0.956 |
DeepLog | 0.953 | 0.961 | 0.957 | 0.900 | 0.960 | 0.929 |
LogAnomaly | 0.960 | 0.940 | 0.950 | 0.970 | 0.940 | 0.960 |
LogBERT | 0.870 | 0.781 | 0.823 | 0.894 | 0.923 | 0.908 |
CNN-BiLSTM | 0.980 | 0.830 | 0.900 | 0.949 | 0.930 | 0.939 |
LogRobust | 0.980 | 1.000 | 0.999 | 0.912 | 0.964 | 0.937 |
LogCL | 0.986 | 0.987 | 0.986 | 0.973 | 0.993 | 0.983 |
训练样本数/103 | 模板数 | 新模板占比/% | 精确度 | 召回率 | F1-score |
---|---|---|---|---|---|
3 | 13 | 56.7 | 0.941 | 1.000 | 0.969 |
4 | 15 | 50.0 | 0.953 | 0.992 | 0.972 |
5 | 15 | 50.0 | 0.986 | 0.987 | 0.986 |
6 | 16 | 46.7 | 0.985 | 0.984 | 0.984 |
Tab. 4 Evaluation results of new logs
训练样本数/103 | 模板数 | 新模板占比/% | 精确度 | 召回率 | F1-score |
---|---|---|---|---|---|
3 | 13 | 56.7 | 0.941 | 1.000 | 0.969 |
4 | 15 | 50.0 | 0.953 | 0.992 | 0.972 |
5 | 15 | 50.0 | 0.986 | 0.987 | 0.986 |
6 | 16 | 46.7 | 0.985 | 0.984 | 0.984 |
模型 | 精确度 | 召回率 | F1-score |
---|---|---|---|
A | 0.964 | 0.928 | 0.945 |
B | 0.975 | 0.982 | 0.977 |
C | 0.968 | 0.990 | 0.979 |
LogCL | 0.986 | 0.987 | 0.986 |
Tab. 5 Results of ablation experiments
模型 | 精确度 | 召回率 | F1-score |
---|---|---|---|
A | 0.964 | 0.928 | 0.945 |
B | 0.975 | 0.982 | 0.977 |
C | 0.968 | 0.990 | 0.979 |
LogCL | 0.986 | 0.987 | 0.986 |
1 | RUFF L, KAUFFMANN J R, VANDERMEULEN R A, et al. A unifying review of deep and shallow anomaly detection[J]. Proceedings of the IEEE, 2021, 109(5): 756-795. 10.1109/jproc.2021.3052449 |
2 | HE S, HE P, CHEN Z, et al. A survey on automated log analysis for reliability engineering[J]. ACM Computing Surveys, 2022, 54(6): No.130. 10.1145/3460345 |
3 | LE V H, ZHANG H. Log-based anomaly detection with deep learning: how far are we?[C]// Proceedings of the 44th International Conference on Software Engineering. New York: ACM, 2022: 1356-1367. 10.1145/3510003.3510155 |
4 | LOU J G, FU Q, YANG S, et al. Mining invariants from console logs for system problem detection[C]// Proceedings of the 2010 USENIX Annual Technical Conference. Berkeley: USENIX Association, 2010: 1-14. 10.1109/msp.2009.28 |
5 | LIN Q, ZHANG H, LOU J G, et al. Log clustering based problem identification for online service systems[C]// Proceedings of the IEEE/ACM 38th International Conference on Software Engineering Companion. New York: ACM, 2016: 102-111. 10.1145/2889160.2889232 |
6 | MENG W, LIU Y, ZHU Y, et al. LogAnomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4739-4745. 10.24963/ijcai.2019/658 |
7 | GUO H, YUAN S, WU X. LogBERT: log anomaly detection via BERT[C]// Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. 10.1109/ijcnn52387.2021.9534113 |
8 | XU W, HUANG L, FOX A, et al. Detecting large-scale system problems by mining console logs[C]// Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. New York: ACM, 2009: 117-132. 10.1145/1629575.1629587 |
9 | LIANG Y, ZHANG Y, XIONG H, et al. Failure prediction in IBM BlueGene/L event logs[C]// Proceedings of the 7th IEEE International Conference on Data Mining. Piscataway: IEEE, 2007: 583-588. 10.1109/icdm.2007.46 |
10 | HAN S, WU Q, ZHANG H, et al. Log-based anomaly detection with robust feature extraction and online learning[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 2300-2311. 10.1109/tifs.2021.3053371 |
11 | LU S, WEI X, LI Y, et al. Detecting anomaly in big data system logs using convolutional neural network[C]// Proceedings of the IEEE 16th International Conference on Dependable, Autonomic and Secure Computing/ IEEE 16th International Conference on Pervasive Intelligence and Computing/ IEEE 4th International Conference on Big Data Intelligence and Computing/ IEEE 3rd Cyber Science and Technology Congress. Piscataway: IEEE, 2018: 151-158. 10.1109/dasc/picom/datacom/cyberscitec.2018.00037 |
12 | LI X, CHEN P, JING L, et al. SwissLog: robust and unified deep learning based log anomaly detection for diverse faults[C]// Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering. Piscataway: IEEE, 2020: 92-103. 10.1109/issre5003.2020.00018 |
13 | HUANG S, LIU Y, FUNG C, et al. HitAnomaly: hierarchical transformers for anomaly detection in system log[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2064-2076. 10.1109/tnsm.2020.3034647 |
14 | DU M, LI F, ZHENG G, et al. DeepLog: anomaly detection and diagnosis from system logs through deep learning[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 1285-1298. 10.1145/3133956.3134015 |
15 | YANG L, CHEN J, WANG Z, et al. Semi-supervised log-based anomaly detection via probabilistic label estimation[C]// Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. Piscataway: IEEE, 2021: 1448-1460. 10.1109/icse43902.2021.00130 |
16 | McINNES L, HEALY J, ASTELS S. HDBSCAN: hierarchical density based clustering[J]. The Journal of Open Source Software, 2017, 2(11): No.205. 10.21105/joss.00205 |
17 | LI B, MA S, DENG R, et al. Federated anomaly detection on system logs for the internet of things: a customizable and communication-efficient approach[J]. IEEE Transactions on Network and Service Management, 2022, 19(2): 1705-1716. 10.1109/tnsm.2022.3152620 |
18 | DUAN X, YING S, YUAN W, et al. QLLog: a log anomaly detection method based on Q-learning algorithm[J]. Information Processing and Management, 2021, 58(3): No.102540. 10.1016/j.ipm.2021.102540 |
19 | CLIFTON J, LABER E. Q-learning: theory and applications[J]. Annual Review of Statistics and Its Application, 2020, 7: 279-301. 10.1146/annurev-statistics-031219-041220 |
20 | DAI H, LI H, CHEN C S, et al. Logram: efficient log parsing using n-gram dictionaries[J]. IEEE Transactions on Software Engineering, 2022, 48(3): 879-892. |
21 | TAO S, MENG W, CHENG Y, et al. LogStamp: automatic online log parsing based on sequence labelling[J]. ACM SIGMETRICS Performance Evaluation Review, 2022, 49(4): 93-98. 10.1145/3543146.3543168 |
22 | HE P, ZHU J, ZHENG Z, et al. Drain: an online log parsing approach with fixed depth tree[C]// Proceedings of the 2017 IEEE International Conference on Web Services. Piscataway: IEEE, 2017: 33-40. 10.1109/icws.2017.13 |
23 | 孙嘉,张建辉,卜佑军,等.基于CNN-BiLSTM模型的日志异常检测方法[J].计算机工程,2022,48(7):151-158. 10.19678/j.issn.1000-3428.0061750 |
SUN J, ZHANG J H, BU Y J, et al. Log anomaly detection method based on CNN-BiLSTM model[J]. Computer Engineering, 2022, 48(7): 151-158. 10.19678/j.issn.1000-3428.0061750 | |
24 | GRAVE E, BOJANOWSKI P, GUPTA P, et al. Learning word vectors for 157 languages[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. [S.l.]: European Language Resources Association, 2018: 3483-3487. |
25 | 王小林,杨林,王东,等. 改进的TF-IDF关键词提取方法[J]. 计算机科学与应用, 2013, 3(1): 64-68. 10.12677/CSA.2013.31012 |
WANG X L, YANG L, WANG D, et al. Improved TF-IDF keyword extraction algorithm[J]. Computer Science and Application, 2013, 3(1): 64-68. 10.12677/CSA.2013.31012 | |
26 | KIRANYAZ S, AVCI O, ABDELJABER O, et al. 1D convolutional neural networks and applications: a survey[J]. Mechanical Systems and Signal Processing, 2021, 151: No.107398. 10.1016/j.ymssp.2020.107398 |
27 | LINDEMANN B, MASCHLER B, SAHLAB N, et al. A survey on anomaly detection for technical systems using LSTM networks[J]. Computers in Industry, 2021, 131: No.103498. 10.1016/j.compind.2021.103498 |
28 | ZHANG B, ZHANG H, MOSCATO P, et al. Anomaly detection via mining numerical workflow relations from logs[C]// Proceedings of the 2020 International Symposium on Reliable Distributed Systems. Piscataway: IEEE, 2020: 195-204. 10.1109/srds51746.2020.00027 |
29 | ZHANG X, XU Y, LIN Q, et al. Robust log-based anomaly detection on unstable log data[C]// Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2019: 807-817. 10.1145/3338906.3338931 |
[1] | Ziyi HE, Yan YANG, Yiling ZHANG. Multi-view clustering network with deep fusion [J]. Journal of Computer Applications, 2023, 43(9): 2651-2656. |
[2] | Kunting LU, Rongrong FEI, Xuande ZHANG. Remote sensing image pansharpening by convolutional neural network [J]. Journal of Computer Applications, 2023, 43(9): 2963-2969. |
[3] | Juntao CHEN, Ziqi ZHU. Image copy-move forgery detection based on multi-scale feature extraction and fusion [J]. Journal of Computer Applications, 2023, 43(9): 2919-2924. |
[4] | Shaofa SHANG, Lin JIANG, Yuancheng LI, Yun ZHU. Adaptive partitioning and scheduling method of convolutional neural network inference model on heterogeneous platforms [J]. Journal of Computer Applications, 2023, 43(9): 2828-2835. |
[5] | Xiang GUO, Wengang JIANG, Yuhang WANG. Encrypted traffic classification method based on improved Inception-ResNet [J]. Journal of Computer Applications, 2023, 43(8): 2471-2476. |
[6] | Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO. General text classification model combining attention and cropping mechanism [J]. Journal of Computer Applications, 2023, 43(8): 2396-2405. |
[7] | Kun ZHANG, Fengyu YANG, Fa ZHONG, Guangdong ZENG, Shijian ZHOU. Source code vulnerability detection based on hybrid code representation [J]. Journal of Computer Applications, 2023, 43(8): 2517-2526. |
[8] | Xiaolin LI, Songjia YANG. Hybrid beamforming for multi-user mmWave relay networks using deep learning [J]. Journal of Computer Applications, 2023, 43(8): 2511-2516. |
[9] | Yi WANG, Jie XIE, Jia CHENG, Liwei DOU. Review of object pose estimation in RGB images based on deep learning [J]. Journal of Computer Applications, 2023, 43(8): 2546-2555. |
[10] | Min LIANG, Jiayi LIU, Jie LI. Image super-resolution reconstruction method based on iterative feedback and attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2280-2287. |
[11] | Kunpei YE, Xi XIONG, Zhe DING. Recruitment recommendation model based on field fusion and time weight [J]. Journal of Computer Applications, 2023, 43(7): 2133-2139. |
[12] | Shuai ZHENG, Xiaolong ZHANG, He DENG, Hongwei REN. 3D liver image segmentation method based on multi-scale feature fusion and grid attention mechanism [J]. Journal of Computer Applications, 2023, 43(7): 2303-2310. |
[13] | Libin CEN, Jingdong LI, Chunbo LIN, Xiaoling WANG. Approximate query processing approach based on deep autoregressive model [J]. Journal of Computer Applications, 2023, 43(7): 2034-2039. |
[14] | Yuxin TUO, Tao XUE. Joint triple extraction model combining pointer network and relational embedding [J]. Journal of Computer Applications, 2023, 43(7): 2116-2124. |
[15] | Yuanyuan QIN, Hong ZHANG. Pulmonary nodule detection algorithm based on attention feature pyramid networks [J]. Journal of Computer Applications, 2023, 43(7): 2311-2318. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||