Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (11): 3510-3516.DOI: 10.11772/j.issn.1001-9081.2022111738
Special Issue: 网络空间安全
• Cyber security • Previous Articles Next Articles
Chunyong YIN(), Yangchun ZHANG
Received:
2022-11-22
Revised:
2023-03-19
Accepted:
2023-03-23
Online:
2023-04-10
Published:
2023-11-10
Contact:
Chunyong YIN
About author:
YIN Chunyong, born in 1977, Ph. D., professor. His research interests include cyberspace security, big data mining, privacy protection, artificial intelligence, new computing.通讯作者:
尹春勇
作者简介:
尹春勇(1977—),男,山东潍坊人,教授,博士生导师,博士,主要研究方向:网络空间安全、大数据挖掘、隐私保护、人工智能、新型计算 yinchunyong@hotmail.comCLC Number:
Chunyong YIN, Yangchun ZHANG. Unsupervised log anomaly detection model based on CNN and Bi-LSTM[J]. Journal of Computer Applications, 2023, 43(11): 3510-3516.
尹春勇, 张杨春. 基于CNN和Bi-LSTM的无监督日志异常检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3510-3516.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022111738
数据集 | 收集时间/d | 大小/GB | 总模板数 | 训练序列数 | 日志数 | 异常数 | 训练模板数 |
---|---|---|---|---|---|---|---|
HDFS | 2 | 1.490 | 30 | 5 000 | 1 175 629 | 16 838(blocks) | 15 |
BGL | 215 | 0.708 | 378 | 7 500 | 4 747 963 | 348 460(logs) | 185 |
Tab. 1 Statistics of two datasets
数据集 | 收集时间/d | 大小/GB | 总模板数 | 训练序列数 | 日志数 | 异常数 | 训练模板数 |
---|---|---|---|---|---|---|---|
HDFS | 2 | 1.490 | 30 | 5 000 | 1 175 629 | 16 838(blocks) | 15 |
BGL | 215 | 0.708 | 378 | 7 500 | 4 747 963 | 348 460(logs) | 185 |
模型 | 技术分类 | 检测方法 | 语义表示 |
---|---|---|---|
LogCluster | 无监督 | 聚类 | 无 |
ADR | 无监督 | IM | 无 |
OES | 有监督 | SVM | 有 |
DeepLog | 无监督 | LSTM | 无 |
LogAnomaly | 无监督 | LSTM | 有 |
LogBERT | 自监督 | BERT | 无 |
CNN-BiLSTM | 有监督 | CNN、BiLSTM | 有 |
LogRobust | 有监督 | Bi-LSTM、注意力机制 | 有 |
Tab. 2 Details of baseline models
模型 | 技术分类 | 检测方法 | 语义表示 |
---|---|---|---|
LogCluster | 无监督 | 聚类 | 无 |
ADR | 无监督 | IM | 无 |
OES | 有监督 | SVM | 有 |
DeepLog | 无监督 | LSTM | 无 |
LogAnomaly | 无监督 | LSTM | 有 |
LogBERT | 自监督 | BERT | 无 |
CNN-BiLSTM | 有监督 | CNN、BiLSTM | 有 |
LogRobust | 有监督 | Bi-LSTM、注意力机制 | 有 |
方法 | HDFS | BGL | ||||
---|---|---|---|---|---|---|
精确度 | 召回率 | F1-score | 精确度 | 召回率 | F1-score | |
LogCluster | 0.993 | 0.371 | 0.540 | 0.955 | 0.640 | 0.766 |
ADR | 0.931 | 0.929 | 0.925 | 0.937 | 1.000 | 0.967 |
OES | 0.978 | 0.974 | 0.976 | 0.932 | 0.981 | 0.956 |
DeepLog | 0.953 | 0.961 | 0.957 | 0.900 | 0.960 | 0.929 |
LogAnomaly | 0.960 | 0.940 | 0.950 | 0.970 | 0.940 | 0.960 |
LogBERT | 0.870 | 0.781 | 0.823 | 0.894 | 0.923 | 0.908 |
CNN-BiLSTM | 0.980 | 0.830 | 0.900 | 0.949 | 0.930 | 0.939 |
LogRobust | 0.980 | 1.000 | 0.999 | 0.912 | 0.964 | 0.937 |
LogCL | 0.986 | 0.987 | 0.986 | 0.973 | 0.993 | 0.983 |
Tab. 3 Experimental results on HDFS dataset
方法 | HDFS | BGL | ||||
---|---|---|---|---|---|---|
精确度 | 召回率 | F1-score | 精确度 | 召回率 | F1-score | |
LogCluster | 0.993 | 0.371 | 0.540 | 0.955 | 0.640 | 0.766 |
ADR | 0.931 | 0.929 | 0.925 | 0.937 | 1.000 | 0.967 |
OES | 0.978 | 0.974 | 0.976 | 0.932 | 0.981 | 0.956 |
DeepLog | 0.953 | 0.961 | 0.957 | 0.900 | 0.960 | 0.929 |
LogAnomaly | 0.960 | 0.940 | 0.950 | 0.970 | 0.940 | 0.960 |
LogBERT | 0.870 | 0.781 | 0.823 | 0.894 | 0.923 | 0.908 |
CNN-BiLSTM | 0.980 | 0.830 | 0.900 | 0.949 | 0.930 | 0.939 |
LogRobust | 0.980 | 1.000 | 0.999 | 0.912 | 0.964 | 0.937 |
LogCL | 0.986 | 0.987 | 0.986 | 0.973 | 0.993 | 0.983 |
训练样本数/103 | 模板数 | 新模板占比/% | 精确度 | 召回率 | F1-score |
---|---|---|---|---|---|
3 | 13 | 56.7 | 0.941 | 1.000 | 0.969 |
4 | 15 | 50.0 | 0.953 | 0.992 | 0.972 |
5 | 15 | 50.0 | 0.986 | 0.987 | 0.986 |
6 | 16 | 46.7 | 0.985 | 0.984 | 0.984 |
Tab. 4 Evaluation results of new logs
训练样本数/103 | 模板数 | 新模板占比/% | 精确度 | 召回率 | F1-score |
---|---|---|---|---|---|
3 | 13 | 56.7 | 0.941 | 1.000 | 0.969 |
4 | 15 | 50.0 | 0.953 | 0.992 | 0.972 |
5 | 15 | 50.0 | 0.986 | 0.987 | 0.986 |
6 | 16 | 46.7 | 0.985 | 0.984 | 0.984 |
模型 | 精确度 | 召回率 | F1-score |
---|---|---|---|
A | 0.964 | 0.928 | 0.945 |
B | 0.975 | 0.982 | 0.977 |
C | 0.968 | 0.990 | 0.979 |
LogCL | 0.986 | 0.987 | 0.986 |
Tab. 5 Results of ablation experiments
模型 | 精确度 | 召回率 | F1-score |
---|---|---|---|
A | 0.964 | 0.928 | 0.945 |
B | 0.975 | 0.982 | 0.977 |
C | 0.968 | 0.990 | 0.979 |
LogCL | 0.986 | 0.987 | 0.986 |
1 | RUFF L, KAUFFMANN J R, VANDERMEULEN R A, et al. A unifying review of deep and shallow anomaly detection[J]. Proceedings of the IEEE, 2021, 109(5): 756-795. 10.1109/jproc.2021.3052449 |
2 | HE S, HE P, CHEN Z, et al. A survey on automated log analysis for reliability engineering[J]. ACM Computing Surveys, 2022, 54(6): No.130. 10.1145/3460345 |
3 | LE V H, ZHANG H. Log-based anomaly detection with deep learning: how far are we?[C]// Proceedings of the 44th International Conference on Software Engineering. New York: ACM, 2022: 1356-1367. 10.1145/3510003.3510155 |
4 | LOU J G, FU Q, YANG S, et al. Mining invariants from console logs for system problem detection[C]// Proceedings of the 2010 USENIX Annual Technical Conference. Berkeley: USENIX Association, 2010: 1-14. 10.1109/msp.2009.28 |
5 | LIN Q, ZHANG H, LOU J G, et al. Log clustering based problem identification for online service systems[C]// Proceedings of the IEEE/ACM 38th International Conference on Software Engineering Companion. New York: ACM, 2016: 102-111. 10.1145/2889160.2889232 |
6 | MENG W, LIU Y, ZHU Y, et al. LogAnomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4739-4745. 10.24963/ijcai.2019/658 |
7 | GUO H, YUAN S, WU X. LogBERT: log anomaly detection via BERT[C]// Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. 10.1109/ijcnn52387.2021.9534113 |
8 | XU W, HUANG L, FOX A, et al. Detecting large-scale system problems by mining console logs[C]// Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. New York: ACM, 2009: 117-132. 10.1145/1629575.1629587 |
9 | LIANG Y, ZHANG Y, XIONG H, et al. Failure prediction in IBM BlueGene/L event logs[C]// Proceedings of the 7th IEEE International Conference on Data Mining. Piscataway: IEEE, 2007: 583-588. 10.1109/icdm.2007.46 |
10 | HAN S, WU Q, ZHANG H, et al. Log-based anomaly detection with robust feature extraction and online learning[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 2300-2311. 10.1109/tifs.2021.3053371 |
11 | LU S, WEI X, LI Y, et al. Detecting anomaly in big data system logs using convolutional neural network[C]// Proceedings of the IEEE 16th International Conference on Dependable, Autonomic and Secure Computing/ IEEE 16th International Conference on Pervasive Intelligence and Computing/ IEEE 4th International Conference on Big Data Intelligence and Computing/ IEEE 3rd Cyber Science and Technology Congress. Piscataway: IEEE, 2018: 151-158. 10.1109/dasc/picom/datacom/cyberscitec.2018.00037 |
12 | LI X, CHEN P, JING L, et al. SwissLog: robust and unified deep learning based log anomaly detection for diverse faults[C]// Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering. Piscataway: IEEE, 2020: 92-103. 10.1109/issre5003.2020.00018 |
13 | HUANG S, LIU Y, FUNG C, et al. HitAnomaly: hierarchical transformers for anomaly detection in system log[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2064-2076. 10.1109/tnsm.2020.3034647 |
14 | DU M, LI F, ZHENG G, et al. DeepLog: anomaly detection and diagnosis from system logs through deep learning[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 1285-1298. 10.1145/3133956.3134015 |
15 | YANG L, CHEN J, WANG Z, et al. Semi-supervised log-based anomaly detection via probabilistic label estimation[C]// Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. Piscataway: IEEE, 2021: 1448-1460. 10.1109/icse43902.2021.00130 |
16 | McINNES L, HEALY J, ASTELS S. HDBSCAN: hierarchical density based clustering[J]. The Journal of Open Source Software, 2017, 2(11): No.205. 10.21105/joss.00205 |
17 | LI B, MA S, DENG R, et al. Federated anomaly detection on system logs for the internet of things: a customizable and communication-efficient approach[J]. IEEE Transactions on Network and Service Management, 2022, 19(2): 1705-1716. 10.1109/tnsm.2022.3152620 |
18 | DUAN X, YING S, YUAN W, et al. QLLog: a log anomaly detection method based on Q-learning algorithm[J]. Information Processing and Management, 2021, 58(3): No.102540. 10.1016/j.ipm.2021.102540 |
19 | CLIFTON J, LABER E. Q-learning: theory and applications[J]. Annual Review of Statistics and Its Application, 2020, 7: 279-301. 10.1146/annurev-statistics-031219-041220 |
20 | DAI H, LI H, CHEN C S, et al. Logram: efficient log parsing using n-gram dictionaries[J]. IEEE Transactions on Software Engineering, 2022, 48(3): 879-892. |
21 | TAO S, MENG W, CHENG Y, et al. LogStamp: automatic online log parsing based on sequence labelling[J]. ACM SIGMETRICS Performance Evaluation Review, 2022, 49(4): 93-98. 10.1145/3543146.3543168 |
22 | HE P, ZHU J, ZHENG Z, et al. Drain: an online log parsing approach with fixed depth tree[C]// Proceedings of the 2017 IEEE International Conference on Web Services. Piscataway: IEEE, 2017: 33-40. 10.1109/icws.2017.13 |
23 | 孙嘉,张建辉,卜佑军,等.基于CNN-BiLSTM模型的日志异常检测方法[J].计算机工程,2022,48(7):151-158. 10.19678/j.issn.1000-3428.0061750 |
SUN J, ZHANG J H, BU Y J, et al. Log anomaly detection method based on CNN-BiLSTM model[J]. Computer Engineering, 2022, 48(7): 151-158. 10.19678/j.issn.1000-3428.0061750 | |
24 | GRAVE E, BOJANOWSKI P, GUPTA P, et al. Learning word vectors for 157 languages[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. [S.l.]: European Language Resources Association, 2018: 3483-3487. |
25 | 王小林,杨林,王东,等. 改进的TF-IDF关键词提取方法[J]. 计算机科学与应用, 2013, 3(1): 64-68. 10.12677/CSA.2013.31012 |
WANG X L, YANG L, WANG D, et al. Improved TF-IDF keyword extraction algorithm[J]. Computer Science and Application, 2013, 3(1): 64-68. 10.12677/CSA.2013.31012 | |
26 | KIRANYAZ S, AVCI O, ABDELJABER O, et al. 1D convolutional neural networks and applications: a survey[J]. Mechanical Systems and Signal Processing, 2021, 151: No.107398. 10.1016/j.ymssp.2020.107398 |
27 | LINDEMANN B, MASCHLER B, SAHLAB N, et al. A survey on anomaly detection for technical systems using LSTM networks[J]. Computers in Industry, 2021, 131: No.103498. 10.1016/j.compind.2021.103498 |
28 | ZHANG B, ZHANG H, MOSCATO P, et al. Anomaly detection via mining numerical workflow relations from logs[C]// Proceedings of the 2020 International Symposium on Reliable Distributed Systems. Piscataway: IEEE, 2020: 195-204. 10.1109/srds51746.2020.00027 |
29 | ZHANG X, XU Y, LIN Q, et al. Robust log-based anomaly detection on unstable log data[C]// Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2019: 807-817. 10.1145/3338906.3338931 |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[3] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[4] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[5] | Tingwei CHEN, Jiacheng ZHANG, Junlu WANG. Random validation blockchain construction for federated learning [J]. Journal of Computer Applications, 2024, 44(9): 2770-2776. |
[6] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[7] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[8] | Hong CHEN, Bing QI, Haibo JIN, Cong WU, Li’ang ZHANG. Class-imbalanced traffic abnormal detection based on 1D-CNN and BiGRU [J]. Journal of Computer Applications, 2024, 44(8): 2493-2499. |
[9] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[10] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[11] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[12] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
[13] | Yangyi GAO, Tao LEI, Xiaogang DU, Suiyong LI, Yingbo WANG, Chongdan MIN. Crowd counting and locating method based on pixel distance map and four-dimensional dynamic convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2233-2242. |
[14] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[15] | Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||