Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3831-3840.DOI: 10.11772/j.issn.1001-9081.2021101730
Special Issue: 网络空间安全
• Cyber security • Previous Articles Next Articles
Jingwei LEI(), Peng YI, Xiang CHEN, Liang WANG, Ming MAO
Received:
2021-10-09
Revised:
2022-01-24
Accepted:
2022-02-21
Online:
2022-04-18
Published:
2022-12-10
Contact:
Jingwei LEI
About author:
YI Peng, born in 1977, Ph. D., research fellow. His research interests include intrusion detection, new network architecture.Supported by:
通讯作者:
雷靖玮
作者简介:
伊鹏(1977—),男,河南郑州人,研究员,博士,主要研究方向:入侵检测、新型网络体系结构基金资助:
CLC Number:
Jingwei LEI, Peng YI, Xiang CHEN, Liang WANG, Ming MAO. PDF document detection model based on system calls and data provenance[J]. Journal of Computer Applications, 2022, 42(12): 3831-3840.
雷靖玮, 伊鹏, 陈祥, 王亮, 毛明. 基于系统调用和数据溯源的PDF文档检测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3831-3840.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101730
受影响版本 | 编号 | 漏洞成因 | 危害影响 |
---|---|---|---|
Acrobat Reader 2017 Acrobat Reader 2020 Acrobat Reader DC Adobe Acrobat Reader 2017 Adobe Acrobat Reader 2020 Adobe Acrobat Reader DC | CVE-2018-4901 | 缓冲区溢出 | 任意代码执行 |
CVE-2019-8197 | 缓冲区溢出 | 任意代码执行 | |
CVE-2020-24426 | 内存越界读 | 内存泄漏 | |
CVE-2020-24427 | 输入验证不正确 | 内存泄漏 | |
CVE-2020-24430 | UAF | 任意代码执行 | |
CVE-2020-24431 | 安全功能绕过 | 动态库注入 | |
CVE-2020-24433 | 访问控制不当 | 提权 | |
CVE-2020-24434 | 内存越界读 | 内存泄漏 | |
CVE-2020-24436 | 内存越界写 | 任意代码执行 | |
CVE-2020-24437 | UAF | 任意代码执行 | |
CVE-2020-24438 | UAF | 内存泄漏 | |
CVE-2021-21038 | 内存越界写 | 任意代码执行 | |
CVE-2021-21044 | 内存越界写 | 任意代码执行 | |
CVE-2021-21086 | 内存越界写 | 任意代码执行 | |
CVE-2021-28550 | UAF | 任意代码执行 | |
CVE-2021-28553 | UAF | 任意代码执行 | |
CVE-2021-28557 | 内存越界读 | 内存泄漏 | |
CVE-2021-28560 | 缓冲区溢出 | 任意代码执行 | |
CVE-2021-28562 | UAF | 任意代码执行 | |
CVE-2021-28564 | 内存越界写 | 任意代码执行 | |
CVE-2021-28565 | 内存越界读 | 内存泄漏 |
Tab. 1 List of AAR vulnerabilities in recent years
受影响版本 | 编号 | 漏洞成因 | 危害影响 |
---|---|---|---|
Acrobat Reader 2017 Acrobat Reader 2020 Acrobat Reader DC Adobe Acrobat Reader 2017 Adobe Acrobat Reader 2020 Adobe Acrobat Reader DC | CVE-2018-4901 | 缓冲区溢出 | 任意代码执行 |
CVE-2019-8197 | 缓冲区溢出 | 任意代码执行 | |
CVE-2020-24426 | 内存越界读 | 内存泄漏 | |
CVE-2020-24427 | 输入验证不正确 | 内存泄漏 | |
CVE-2020-24430 | UAF | 任意代码执行 | |
CVE-2020-24431 | 安全功能绕过 | 动态库注入 | |
CVE-2020-24433 | 访问控制不当 | 提权 | |
CVE-2020-24434 | 内存越界读 | 内存泄漏 | |
CVE-2020-24436 | 内存越界写 | 任意代码执行 | |
CVE-2020-24437 | UAF | 任意代码执行 | |
CVE-2020-24438 | UAF | 内存泄漏 | |
CVE-2021-21038 | 内存越界写 | 任意代码执行 | |
CVE-2021-21044 | 内存越界写 | 任意代码执行 | |
CVE-2021-21086 | 内存越界写 | 任意代码执行 | |
CVE-2021-28550 | UAF | 任意代码执行 | |
CVE-2021-28553 | UAF | 任意代码执行 | |
CVE-2021-28557 | 内存越界读 | 内存泄漏 | |
CVE-2021-28560 | 缓冲区溢出 | 任意代码执行 | |
CVE-2021-28562 | UAF | 任意代码执行 | |
CVE-2021-28564 | 内存越界写 | 任意代码执行 | |
CVE-2021-28565 | 内存越界读 | 内存泄漏 |
K值 | 线程调用序列文件大小 |
---|---|
1 | |
4 | |
8 | |
12 | |
20 |
Tab.2 Selection method of K-value
K值 | 线程调用序列文件大小 |
---|---|
1 | |
4 | |
8 | |
12 | |
20 |
真实标签 | 预测结果 | |
---|---|---|
恶意 | 良性 | |
恶意 | TP | FN |
良性 | FP | TN |
Tab.3 Specific meanings of evaluation indicators
真实标签 | 预测结果 | |
---|---|---|
恶意 | 良性 | |
恶意 | TP | FN |
良性 | FP | TN |
阈值 | 路径筛选算法 | 精确率 | 召回率 | F1分数 |
---|---|---|---|---|
0.95 | TF-IDF | 0.943 | 0.828 | 0.882 |
PROVDETECTOR | 0.960 | 0.984 | 0.972 | |
NtProvancer | 0.988 | 0.988 | 0.988 | |
0.96 | TF-IDF | 0.914 | 0.925 | 0.919 |
PROVDETECTOR | 0.955 | 0.994 | 0.974 | |
NtProvancer | 0.979 | 1.000 | 0.989 | |
0.97 | TF-IDF | 0.886 | 0.950 | 0.917 |
PROVDETECTOR | 0.927 | 0.994 | 0.959 | |
NtProvancer | 0.973 | 1.000 | 0.986 | |
0.98 | TF-IDF | 0.881 | 0.972 | 0.924 |
PROVDETECTOR | 0.917 | 1.000 | 0.957 | |
NtProvancer | 0.967 | 1.000 | 0.983 | |
0.99 | TF-IDF | 0.851 | 0.997 | 0.918 |
PROVDETECTOR | 0.907 | 1.000 | 0.951 | |
NtProvancer | 0.961 | 1.000 | 0.980 |
Tab. 4 Comparison results of three evaluation indicators
阈值 | 路径筛选算法 | 精确率 | 召回率 | F1分数 |
---|---|---|---|---|
0.95 | TF-IDF | 0.943 | 0.828 | 0.882 |
PROVDETECTOR | 0.960 | 0.984 | 0.972 | |
NtProvancer | 0.988 | 0.988 | 0.988 | |
0.96 | TF-IDF | 0.914 | 0.925 | 0.919 |
PROVDETECTOR | 0.955 | 0.994 | 0.974 | |
NtProvancer | 0.979 | 1.000 | 0.989 | |
0.97 | TF-IDF | 0.886 | 0.950 | 0.917 |
PROVDETECTOR | 0.927 | 0.994 | 0.959 | |
NtProvancer | 0.973 | 1.000 | 0.986 | |
0.98 | TF-IDF | 0.881 | 0.972 | 0.924 |
PROVDETECTOR | 0.917 | 1.000 | 0.957 | |
NtProvancer | 0.967 | 1.000 | 0.983 | |
0.99 | TF-IDF | 0.851 | 0.997 | 0.918 |
PROVDETECTOR | 0.907 | 1.000 | 0.951 | |
NtProvancer | 0.961 | 1.000 | 0.980 |
用时情况 | 训练阶段 | 检测阶段 |
---|---|---|
共计 | 251.51 | 60.55 |
线程调用序列提取用时平均用时 | 19.11 | 17.83 |
构建溯源图平均用时 | 7.93 | 5.84 |
特征序列提取平均用时 | 26.50 | 23.57 |
构建特征库/异常检测平均用时 | 197.97 | 13.31 |
Tab.5 Time cost of training and detection stages
用时情况 | 训练阶段 | 检测阶段 |
---|---|---|
共计 | 251.51 | 60.55 |
线程调用序列提取用时平均用时 | 19.11 | 17.83 |
构建溯源图平均用时 | 7.93 | 5.84 |
特征序列提取平均用时 | 26.50 | 23.57 |
构建特征库/异常检测平均用时 | 197.97 | 13.31 |
1 | KASPERSKY. Kaspersky Security Bulletin 2015[R/OL]. [2020-09-16].. 10.1016/s1353-4858(15)30032-5 |
2 | CORONA I, MAIORCA D, ARIU D, et al. Lux0r: detection of malicious PDF-embedded JavaScript code through discriminant analysis of API references[C]// Proceedings of the 2014 ACM Artificial Intelligent and Security Workshop. New York: ACM, 2014: 47-57. 10.1145/2666652.2666657 |
3 | LASKOV P, ŠRNDIĆ N. Static detection of malicious JavaScript-bearing PDF documents[C]// Proceedings of the 27th Annual Computer Security Applications Conference. New York: ACM, 2011: 373-382. 10.1145/2076732.2076785 |
4 | LU X, ZHUGE J W, WANG R Y, et al. De-obfuscation and detection of malicious PDF files with high accuracy[C]// Proceedings of the 46th Hawaii International Conference on System Sciences. Piscataway: IEEE, 2013: 4890-4899. 10.1109/hicss.2013.166 |
5 | MAIORCA D, ARIU D, CORONA I, et al. A structural and content-based approach for a precise and robust detection of malicious PDF files[C]// Proceedings of the 2015 International Conference on Information Systems Security and Privacy. Piscataway: IEEE, 2015: 27-36. 10.5220/0005264400270036 |
6 | MAIORCA D, ARIU D, CORONA I, et al. An evasion resilient approach to the detection of malicious PDF files[C]// Proceedings of the 2015 International Conference on Information Systems Security and Privacy, CCIS 576. Cham: Springer, 2015: 68-85. |
7 | LIU L P, HE X H, LIU L, et al. Capturing the symptoms of malicious code in electronic documents by file's entropy signal combined with machine learning[J]. Applied Soft Computing, 2019, 82: No.105598. 10.1016/j.asoc.2019.105598 |
8 | SMUTZ C, STAVROU A. Malicious PDF detection using metadata and structural features[C]// Proceedings of the 28th Annual Computer Security Applications Conference. New York: ACM, 2012: 239-248. 10.1145/2420950.2420987 |
9 | ŠRNDIĆ N, LASKOV P. Detection of malicious PDF files based on hierarchical document structure[C]// Proceedings of the 20th Annual Network and Distributed System Security Symposium. Reston, VA: Internet Society, 2016: 1-16. 10.1186/s13635-016-0045-0 |
10 | SNOW K Z, KRISHNAN S, MONROSE F, et al. SHELLOS: enabling fast detection and forensic analysis of code injection attacks[C]// Proceedings of the 20th USENIX Security Symposium. Berkeley: USENIX Association, 2011: 1-16. 10.1109/ms.2011.67 |
11 | TZERMIAS Z, SYKIOTAKIS G, POLYCHRONAKIS M, et al. Combining static and dynamic analysis for the detection of malicious documents[C]// Proceedings of the 4th European Workshop on System Security. New York: ACM, 2011: No.4. 10.1145/1972551.1972555 |
12 | CARMONY C, ZHANG M, HU X C, et al. Extract me if you can: abusing PDF parsers in malware detectors[C]// Proceedings of the 20th Annual Network and Distributed System Security Symposium. Reston, VA: Internet Society, 2016: 1-15. 10.14722/ndss.2016.23483 |
13 | MAIORCA D, CORONA I, GIACINTO G. Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection[C]// Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security. New York: ACM, 2013: 119-130. 10.1145/2484313.2484327 |
14 | HERATH J D, YANG P, YAN G H. Real-time evasion attacks against deep learning-based anomaly detection from distributed system logs[C]// Proceedings of the 11th ACM Conference on Data and Application Security and Privacy. New York: ACM, 2021: 29-40. 10.1145/3422337.3447833 |
15 | FLEURY N, DUBRUNQUEZ T, ALOUANI I. PDF-malware: an overview on threats, detection and evasion attacks[EB/OL]. (2021-07-27) [2021-10-15].. |
16 | 马洪亮,王伟,韩臻. 混淆恶意JavaScript代码的检测与反混淆方法研究[J]. 计算机学报, 2017, 40(7): 1699-1713. 10.11897/SP.J.1016.2017.01699 |
MA H L, WANG W, HAN Z. Detecting and de-obfuscating obfuscated malicious JavaScript code[J]. Chinese Journal of Computers, 2017, 40(7): 1699-1713. 10.11897/SP.J.1016.2017.01699 | |
17 | 王丽娜,谈诚,余荣威, 等. 针对数据泄露行为的恶意软件检测[J]. 计算机研究与发展, 2017, 54(7): 1537-1548. |
WANG L N, TAN C, YU R W, et al. The malware detection based on data breach actions[J]. Journal of Computer Research and Development, 2017, 54(7): 1537-1548. | |
18 | JIANG J G, WANG C H, YU M, et al. NFDD: a dynamic malicious document detection method without manual feature dictionary[C]// Proceedings of the 2021 International Conference on Wireless Algorithms, Systems, and Applications, LNCS 12938. Cham: Springer, 2021: 147-159. |
19 | MANZANO F A. Adobe Reader X BMP/RLE heap corruption[R/OL]. (2012-12) [2021-07-30].. |
20 | 王伟平,柏军洋,张玉婵,等. 基于代码改写的JavaScript动态污点跟踪[J]. 清华大学学报(自然科学版), 2016, 56(9): 956-962, 968. |
WANG W P, BAI J Y, ZHANG Y C, et al. Dynamic taint tracking in JavaScript using revised code[J]. Journal of Tsinghua University (Science and Technology), 2016, 56(9): 956-962, 968. | |
21 | LIU D P, WANG H N, STAVROU A. Detecting malicious Javascript in PDF through document instrumentation[C]// Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. Piscataway: IEEE, 2014: 100-111. 10.1109/dsn.2014.92 |
22 | WANG Q, HASSAN W U, LI D, et al. You are what you do: hunting stealthy malware via data provenance analysis[C]// Proceedings of the 27th Annual Network and Distributed System Security Symposium. Reston, VA: Internet Society, 2020: 1-17. 10.14722/ndss.2020.24167 |
23 | BRIDGES R A, GLASS-VANDERLAN T R, IANNACONE M D, et al. A survey of intrusion detection systems leveraging host data[J]. ACM Computing Surveys, 2020, 52(6): No.128. 10.1145/3344382 |
[1] | Haixiang HUANG, Shuanghe PENG, Ziyu ZHONG. Binary code identification based on user system call sequences [J]. Journal of Computer Applications, 2024, 44(7): 2160-2167. |
[2] | ZHANG Xuewang, YIN Zijie, FENG Jiaqi, YE Caijin, FU Kang. Data trading scheme based on blockchain and trusted computing [J]. Journal of Computer Applications, 2021, 41(4): 939-944. |
[3] | CAI Mengjuan, CHEN Xingshu, JIN Xin, ZHAO Cheng, YIN Mingyong. Paging-measurement method for virtual machine process code based on hardware virtualization [J]. Journal of Computer Applications, 2018, 38(2): 305-309. |
[4] | LI Jinjin, JIA Xiaoqi, DU Haichao, WANG Lipeng. Efficient virtualization-based approach to improve system availability [J]. Journal of Computer Applications, 2017, 37(4): 986-992. |
[5] | ZHAO Cheng, CHEN Xingshu, JIN Xin. Virtual machine file integrity monitoring based on hardware virtualization [J]. Journal of Computer Applications, 2017, 37(2): 388-391. |
[6] | ZHOU Dengyuan, LI Qingbao, ZHANG Lei, KONG Weiliang. Windows clipboard operations monitoring based on virtual machine monitor [J]. Journal of Computer Applications, 2016, 36(2): 511-515. |
[7] | HUA Qing, XU Guoyan, ZHANG Ye. Improved Kalman algorithm for abnormal data detection based on multidimensional impact factors [J]. Journal of Computer Applications, 2015, 35(11): 3112-3115. |
[8] | WU Ying JIANG Jian-hui. System call anomaly detection with least entropy length based on process traces [J]. Journal of Computer Applications, 2012, 32(12): 3439-3444. |
[9] | . Hierarchical method to analyze malware behavior [J]. Journal of Computer Applications, 2010, 30(4): 1048-1052. |
[10] | . Generation of system malicious behavior specification based on system call trace [J]. Journal of Computer Applications, 2010, 30(07): 1767-1770. |
[11] | . Research and implementation of layer access control technology based on Linux kernel driver [J]. Journal of Computer Applications, 2009, 29(09): 2369-2374. |
[12] | HE Zhi 何志 . Research of HSC-based hidden process detection technique [J]. Journal of Computer Applications, 2008, 28(7): 1772-1775. |
[13] | Wen-gang ZHAO Le-hai ZHONG Ya ZHANG Jin YANG Hai-yang ZOU. Application of fuzzy window Markov chain in IDS [J]. Journal of Computer Applications, 2008, 28(6): 1398-1400. |
[14] | . Design and implementation of intrusion detection system based on system-call [J]. Journal of Computer Applications, 2006, 26(9): 2137-2139. |
[15] | ;;. An Intrusion Detection Model Based on Immune and Rough Sets Theory [J]. Journal of Computer Applications, 2006, 26(5): 1077-1080. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||