Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (6): 1911-1921.DOI: 10.11772/j.issn.1001-9081.2024060857
• Cyber security • Previous Articles
Tianchen HUA1, Xiaoning MA1, Hui ZHI2()
Received:
2024-06-24
Revised:
2024-09-12
Accepted:
2024-09-13
Online:
2024-10-08
Published:
2025-06-10
Contact:
Hui ZHI
About author:
HUA Tianchen, born in 2000, M. S. candidate. His research interests include network and information security, malware detection.Supported by:
通讯作者:
智慧
作者简介:
花天辰(2000—),男,安徽合肥人,硕士研究生,主要研究方向:网络与信息安全、恶意软件检测基金资助:
CLC Number:
Tianchen HUA, Xiaoning MA, Hui ZHI. Portable executable malware static detection model based on shallow artificial neural network[J]. Journal of Computer Applications, 2025, 45(6): 1911-1921.
花天辰, 马晓宁, 智慧. 基于浅层人工神经网络的可移植执行恶意软件静态检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1911-1921.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024060857
特征类别 | 特征名 | 特征长度 |
---|---|---|
与数据无关格式 | 字节直方图 | 256 |
字节熵直方图 | 256 | |
字符串 | 104 | |
与数据相关格式 | 通用文件信息 | 10 |
头部信息 | 62 | |
导入表 | 1 280 | |
导出表 | 128 | |
节信息 | 255 | |
额外特征 | 数据目录 | 30 |
Tab. 1 PE file features of EMBER dataset
特征类别 | 特征名 | 特征长度 |
---|---|---|
与数据无关格式 | 字节直方图 | 256 |
字节熵直方图 | 256 | |
字符串 | 104 | |
与数据相关格式 | 通用文件信息 | 10 |
头部信息 | 62 | |
导入表 | 1 280 | |
导出表 | 128 | |
节信息 | 255 | |
额外特征 | 数据目录 | 30 |
类型 | 网络层 | 单元数 | 输出形状 | 激活函数 |
---|---|---|---|---|
输入层 | 2 151 | (None,2 151) | ||
隐藏层1 | dropout1 | (None,2 151) | ReLU | |
dense1 | 1 600 | (None,1 600) | ||
隐藏层2 | dropout2 | (None,1 600) | ReLU | |
dense2 | 800 | (None,800) | ||
输出层 | dropout3 | (None,800) | Sigmoid | |
dense3 | 1 | (None,1) |
Tab. 2 Parametric details of SANN
类型 | 网络层 | 单元数 | 输出形状 | 激活函数 |
---|---|---|---|---|
输入层 | 2 151 | (None,2 151) | ||
隐藏层1 | dropout1 | (None,2 151) | ReLU | |
dense1 | 1 600 | (None,1 600) | ||
隐藏层2 | dropout2 | (None,1 600) | ReLU | |
dense2 | 800 | (None,800) | ||
输出层 | dropout3 | (None,800) | Sigmoid | |
dense3 | 1 | (None,1) |
网络层 | 计算复杂度 | 空间复杂度(参数量) |
---|---|---|
总计 | 9 394 800 | 4 724 801 |
FC1 | 6 833 200 | 3 443 200 |
FC2 | 2 560 000 | 1 280 800 |
FC3 | 1 600 | 801 |
Tab. 3 Model complexity of SANN
网络层 | 计算复杂度 | 空间复杂度(参数量) |
---|---|---|
总计 | 9 394 800 | 4 724 801 |
FC1 | 6 833 200 | 3 443 200 |
FC2 | 2 560 000 | 1 280 800 |
FC3 | 1 600 | 801 |
参数 | 值 |
---|---|
输入维度 | 2 151 |
优化器 | Adam |
学习率 | 0.001 |
损失函数 | BCELoss |
批次大小 | 256 |
训练周期 | 10 |
数据集划分 | 75%训练,25%测试 |
Tab. 4 Training parameters of SANN
参数 | 值 |
---|---|
输入维度 | 2 151 |
优化器 | Adam |
学习率 | 0.001 |
损失函数 | BCELoss |
批次大小 | 256 |
训练周期 | 10 |
数据集划分 | 75%训练,25%测试 |
指标 | 值/% |
---|---|
AUC | 98.64 |
Precision | 95.33 |
Accuracy | 95.24 |
DR | 95.61 |
F1-Score | 95.22 |
FPR | 4.30 |
FNR | 4.45 |
Tab. 5 Model experimental results
指标 | 值/% |
---|---|
AUC | 98.64 |
Precision | 95.33 |
Accuracy | 95.24 |
DR | 95.61 |
F1-Score | 95.22 |
FPR | 4.30 |
FNR | 4.45 |
模型 | 评估指标/% | |||
---|---|---|---|---|
Accuracy | Precision | AUC | DR | |
RF | 92.43 | 93.33 | 97.98 | 92.11 |
kNN | 87.20 | 93.37 | 93.79 | 87.80 |
DT | 90.70 | 93.30 | 93.14 | 91.56 |
LightGBM | 93.67 | 92.09 | 98.64 | 95.43 |
MalConv | 94.05 | 88.15 | 98.55 | 94.96 |
文献[ | 75.55 | 79.28 | 87.02 | 95.00 |
文献[ | 92.20 | 91.42 | 97.56 | 92.65 |
SANN | 95.24 | 95.33 | 98.74 | 95.64 |
Tab. 6 Comparison of experimental results of SANN and different models
模型 | 评估指标/% | |||
---|---|---|---|---|
Accuracy | Precision | AUC | DR | |
RF | 92.43 | 93.33 | 97.98 | 92.11 |
kNN | 87.20 | 93.37 | 93.79 | 87.80 |
DT | 90.70 | 93.30 | 93.14 | 91.56 |
LightGBM | 93.67 | 92.09 | 98.64 | 95.43 |
MalConv | 94.05 | 88.15 | 98.55 | 94.96 |
文献[ | 75.55 | 79.28 | 87.02 | 95.00 |
文献[ | 92.20 | 91.42 | 97.56 | 92.65 |
SANN | 95.24 | 95.33 | 98.74 | 95.64 |
数据集 | 恶意样本 | 良性样本 | 训练集 | 测试集 |
---|---|---|---|---|
EMBER* | 60 000 | 60 000 | 100 000 | 20 000 |
VS1 | 65 536 | 50 000 | 92 428 | 23 108 |
VS2 | 65 536 | 50 000 | 92 428 | 23 108 |
Tab. 7 Comparison of dataset configurations
数据集 | 恶意样本 | 良性样本 | 训练集 | 测试集 |
---|---|---|---|---|
EMBER* | 60 000 | 60 000 | 100 000 | 20 000 |
VS1 | 65 536 | 50 000 | 92 428 | 23 108 |
VS2 | 65 536 | 50 000 | 92 428 | 23 108 |
数据集 | 模型 | Accuracy | Precision | AUC | DR |
---|---|---|---|---|---|
EMBER* | LightGBM | 87.98 | 93.87 | 93.61 | 81.27 |
MalConv | 86.22 | 91.83 | 93.29 | 79.52 | |
文献[ | 89.17 | 93.33 | 94.99 | 84.38 | |
文献[ | 88.13 | 87.83 | 95.37 | 88.54 | |
SANN | 91.19 | 90.80 | 96.77 | 91.67 | |
VS1 | LightGBM | 91.22 | 92.24 | 96.55 | 90.02 |
MalConv | 91.91 | 91.61 | 96.90 | 92.26 | |
文献[ | 87.97 | 86.94 | 95.35 | 89.37 | |
文献[ | 73.10 | 72.66 | 84.45 | 74.33 | |
SANN | 95.68 | 95.64 | 98.80 | 95.73 | |
VS2 | LightGBM | 92.23 | 92.68 | 96.84 | 91.93 |
MalConv | 91.41 | 90.42 | 96.62 | 90.33 | |
文献[ | 90.98 | 91.52 | 97.05 | 92.64 | |
文献[ | 73.88 | 79.21 | 83.20 | 64.57 | |
SANN | 92.58 | 92.91 | 97.50 | 92.18 |
Tab.8 Comparison experimental results on different datasets
数据集 | 模型 | Accuracy | Precision | AUC | DR |
---|---|---|---|---|---|
EMBER* | LightGBM | 87.98 | 93.87 | 93.61 | 81.27 |
MalConv | 86.22 | 91.83 | 93.29 | 79.52 | |
文献[ | 89.17 | 93.33 | 94.99 | 84.38 | |
文献[ | 88.13 | 87.83 | 95.37 | 88.54 | |
SANN | 91.19 | 90.80 | 96.77 | 91.67 | |
VS1 | LightGBM | 91.22 | 92.24 | 96.55 | 90.02 |
MalConv | 91.91 | 91.61 | 96.90 | 92.26 | |
文献[ | 87.97 | 86.94 | 95.35 | 89.37 | |
文献[ | 73.10 | 72.66 | 84.45 | 74.33 | |
SANN | 95.68 | 95.64 | 98.80 | 95.73 | |
VS2 | LightGBM | 92.23 | 92.68 | 96.84 | 91.93 |
MalConv | 91.41 | 90.42 | 96.62 | 90.33 | |
文献[ | 90.98 | 91.52 | 97.05 | 92.64 | |
文献[ | 73.88 | 79.21 | 83.20 | 64.57 | |
SANN | 92.58 | 92.91 | 97.50 | 92.18 |
模型 | 输入维度 | TPR/% | 训练时长/min | 特征向量化 时长/min |
---|---|---|---|---|
LightGBM | 2 351 | 95.42 | 6 | 4 |
MalConv | — | 94.90 | — | — |
文献[ | 2 381 | 94.96 | 11 | 5 |
文献[ | 2 381 | 92.67 | 5 | 6 |
SANN | 2 151 | 95.60 | 2 | 3 |
Tab. 9 Performance comparison of different models
模型 | 输入维度 | TPR/% | 训练时长/min | 特征向量化 时长/min |
---|---|---|---|---|
LightGBM | 2 351 | 95.42 | 6 | 4 |
MalConv | — | 94.90 | — | — |
文献[ | 2 381 | 94.96 | 11 | 5 |
文献[ | 2 381 | 92.67 | 5 | 6 |
SANN | 2 151 | 95.60 | 2 | 3 |
模型 | Accuracy | Precision | AUC | DR | F1-Score |
---|---|---|---|---|---|
SANN-A | 95.40 | 95.33 | 98.74 | 95.64 | 95.22 |
SANN-B | 96.07 | 94.88 | 98.64 | 95.52 | 95.41 |
Tab. 10 Comparison of experimental results of SANN with different numbers of input features
模型 | Accuracy | Precision | AUC | DR | F1-Score |
---|---|---|---|---|---|
SANN-A | 95.40 | 95.33 | 98.74 | 95.64 | 95.22 |
SANN-B | 96.07 | 94.88 | 98.64 | 95.52 | 95.41 |
模型 | 训练时长 | 特征向量化时长 |
---|---|---|
SANN-A | 2.9 | 3.2 |
SANN-B | 3.6 | 4.6 |
Tab. 11 Time performance comparison of SANN with different features input
模型 | 训练时长 | 特征向量化时长 |
---|---|---|
SANN-A | 2.9 | 3.2 |
SANN-B | 3.6 | 4.6 |
1 | AV-TEST Institute. Malware statistics & trends report[EB/OL]. [2024-05-10].. |
2 | DELDAR F, ABADI M. Deep learning for zero-day malware detection and classification: a survey[J]. ACM Computing Surveys, 2024, 56(2): No.36. |
3 | QUAN W, CHEN J, LIU Y, et al. Deep learning-based image and video inpainting: a survey[J]. International Journal of Computer Vision, 2024, 132(7): 2367-2400. |
4 | CHEN Y, WANG Q, WU S, et al. TOMGPT: reliable text-only training approach for cost-effective multi-modal large language model[J]. ACM Transactions on Knowledge Discovery from Data, 2024, 18(7): No.171. |
5 | WANG M, CHEN J, ZHANG X L, et al. End-to-end multi-modal speech recognition on an air and bone conducted speech corpus [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 513-524. |
6 | 李芳,朱子元,闫超,等. 基于集成学习技术的恶意软件检测方法[J]. 信息安全学报, 2024, 9(1): 137-155. |
LI F, ZHU Z Y, YAN C, et al. Malware detection method based on ensemble learning technology[J]. Journal of Cyber Security, 2024, 9(1): 137-155. | |
7 | FLEURY N, DUBRUNQUEZ T, ALOUANI I. PDF-malware: an overview on threats, detection and evasion attacks[EB/OL]. [2024-08-04].. |
8 | LING X, WU L, ZHANG J, et al. Adversarial attacks against Windows PE malware detection: a survey of the state-of-the-art [J]. Computers and Security, 2023, 128: No.103134. |
9 | RAFF E, BARKER J, SYLVESTER J, et al. Malware detection by eating a whole exe [C]// Proceedings of the Workshops of the 32nd AAAI Conference on Artificial Intelligence Workshops. Palo Alto: AAAI Press, 2018: 268-276. |
10 | KRČÁL M, ŠVEC O, BÁLEK M, et al. Deep convolutional malware classifiers can learn from raw executables and labels only[EB/OL]. [2024-05-11].. |
11 | MOLLOY C, BANKS J, DING H H, et al. Adversarial variational modality reconstruction and regularization for zero-day malware variants similarity detection[C]// Proceedings of the 2022 IEEE International Conference on Data Mining. Piscataway: IEEE, 2022: 1131-1136. |
12 | LIU X, LIN Y, LI H, et al. A novel method for malware detection on ML-based visualization technique[J]. Computers and Security, 2020, 89: No.101682. |
13 | 轩勃娜,李进. 基于改进CNN的恶意软件分类方法[J]. 电子学报, 2023, 51(5): 1187-1197. |
XUAN B N, LI J. Malware classification method based on improved CNN[J]. Acta Electronica Sinica, 2023, 51(5): 1187-1197. | |
14 | KIM J Y, BU S J, CHO S B. Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders [J]. Information Sciences, 2018, 460/461: 83-102. |
15 | LING X, WU L, DENG W, et al. MalGraph: hierarchical graph neural networks for robust windows malware detection[C]// Proceedings of the 2022 IEEE Conference on Computer Communications. Piscataway: IEEE, 2022: 1998-2007. |
16 | 李思聪,王坚,宋亚飞,等. TriCh-LKRepNet: 融合三通道映射与结构重参数化的大核卷积恶意代码分类网络[J]. 电子学报, 2024, 52(7): 2331-2340. |
LI S C, WANG J, SONG Y F, et al. TriCh-LKRepNet: a large kernel convolutional malicious code classification network for structure reparameterization and triple-channel mapping[J]. Acta Electronica Sinica, 2024, 52(7): 2331-2340. | |
17 | KYADIGE A, RUDD E M, BERLIN K. Learning from context: a multi-view deep learning architecture for malware detection[C]// Proceedings of the 2020 IEEE Security and Privacy Workshops. Piscataway: IEEE, 2020: 1-7. |
18 | MILLAR S, McLAUGHLIN N, MARTINEZ DEL RINCON J, et al. Multi-view deep learning for zero-day Android malware detection[J]. Journal of Information Security and Applications, 2021, 58: No.102718. |
19 | ANDERSON H S, ROTH P. EMBER: an open dataset for training static PE malware machine learning models [EB/OL]. [2024-05-11].. |
20 | YONG WONG M, LANDEN M, ANTONAKAKIS M, et al. An inside look into the practice of malware analysis[C]// Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2021: 3053-3069. |
21 | DEMETRIO L, BIGGIO B, LAGORIO G, et al. Explaining vulnerabilities of deep learning to adversarial malware binaries[C]// Proceedings of the 3rd Italian Conference on Cyber Security. Aachen: CEUR-WS.org, 2019: No.9. |
22 | project LIEF. LIEF [CP/OL]. [2024-05-06].. |
23 | CARRERA E. pefile[CP/OL]. [2024-05-09].. |
24 | VINAYAKUMAR R, ALAZAB M, SOMAN K P, et al. Robust intelligent malware detection using deep learning[J]. IEEE Access, 2019, 7: 46717-46738. |
25 | VINAYAKUMAR R, SOMAN K P. DeepMalNet: evaluating shallow and deep networks for static PE malware detection[J]. ICT Express, 2018, 4(4): 255-258. |
26 | SINGH P, BORGOHAIN S K, SARKAR A K, et al. Feed-Forward Deep Neural Network (FFDNN)-based deep features for static malware detection[J]. International Journal of Intelligent Systems, 2023, 2023: No.9544481. |
27 | LAD S S, ADAMUTHE A C. Improved deep learning model for static PE files malware detection and classification[J]. International Journal of Computer Network and Information Security, 2022, 14(2): 14-26. |
28 | KE G, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 3149-3157. |
29 | VirusShare. VirusShare.com — because sharing is caring[EB/OL]. [2024-07-23].. |
30 | LEE A. Latest entries of the portable freeware collection [EB/OL]. [2024-07-23].. |
31 | RIBEIRO M T, SINGH S, GUESTRIN C. “Why should I trust you?”: explaining the predictions of any classifier[C]// Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1135-1144. |
32 | GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. [2024-05-15].. |
[1] | Lanhao LI, Haojun YAN, Haoyi ZHOU, Qingyun SUN, Jianxin LI. Multi-scale information fusion time series long-term forecasting model based on neural network [J]. Journal of Computer Applications, 2025, 45(6): 1776-1783. |
[2] | Sijie NIU, Yuliang LIU. Auxiliary diagnostic method for retinopathy based on dual-branch structure with knowledge distillation [J]. Journal of Computer Applications, 2025, 45(5): 1410-1414. |
[3] | Dan WANG, Wenhao ZHANG, Lijuan PENG. Channel estimation of reconfigurable intelligent surface assisted communication system based on deep learning [J]. Journal of Computer Applications, 2025, 45(5): 1613-1618. |
[4] | Kai CHEN, Hailiang YE, Feilong CAO. Classification algorithm for point cloud based on local-global interaction and structural Transformer [J]. Journal of Computer Applications, 2025, 45(5): 1671-1676. |
[5] | Wenpeng WANG, Yinchang QIN, Wenxuan SHI. Review of unsupervised deep learning methods for industrial defect detection [J]. Journal of Computer Applications, 2025, 45(5): 1658-1670. |
[6] | Xueying LI, Kun YANG, Guoqing TU, Shubo LIU. Adversarial sample generation method for time-series data based on local augmentation [J]. Journal of Computer Applications, 2025, 45(5): 1573-1581. |
[7] | Lihu PAN, Shouxin PENG, Rui ZHANG, Zhiyang XUE, Xuzhen MAO. Video anomaly detection for moving foreground regions [J]. Journal of Computer Applications, 2025, 45(4): 1300-1309. |
[8] | Yiding WANG, Zehao WANG, Yaoli LI, Shaoqing CAI, Yuan YUAN. Multi-scale 2D-Adaboost microscopic image recognition algorithm of Chinese medicinal materials powder [J]. Journal of Computer Applications, 2025, 45(4): 1325-1332. |
[9] | Yang ZHOU, Hui LI. Remote sensing image building extraction network based on dual promotion of semantic and detailed features [J]. Journal of Computer Applications, 2025, 45(4): 1310-1316. |
[10] | Ruilong CHEN, Tao HU, Youjun BU, Peng YI, Xianjun HU, Wei QIAO. Stacking ensemble adversarial defense method for encrypted malicious traffic detection model [J]. Journal of Computer Applications, 2025, 45(3): 864-871. |
[11] | Zhenhua XUE, Qiang LI, Chao HUANG. Vision foundation model-driven pixel-level image anomaly detection method [J]. Journal of Computer Applications, 2025, 45(3): 823-831. |
[12] | Miaolei DENG, Yupei KAN, Chuanchuan SUN, Haihang XU, Shaojun FAN, Xin ZHOU. Summary of network intrusion detection systems based on deep learning [J]. Journal of Computer Applications, 2025, 45(2): 453-466. |
[13] | Songsen YU, Zhifan LIN, Guopeng XUE, Jianyu XU. Lightweight large-format tile defect detection algorithm based on improved YOLOv8 [J]. Journal of Computer Applications, 2025, 45(2): 647-654. |
[14] | Danni DING, Bo PENG, Xi WU. VPNet: fatty liver ultrasound image classification method inspired by ventral pathway [J]. Journal of Computer Applications, 2025, 45(2): 662-669. |
[15] | Yan LI, Guanhua YE, Yawen LI, Meiyu LIANG. Enterprise ESG indicator prediction model based on richness coordination technology [J]. Journal of Computer Applications, 2025, 45(2): 670-676. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||