《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1911-1921.DOI: 10.11772/j.issn.1001-9081.2024060857

• 网络空间安全 • 上一篇    

基于浅层人工神经网络的可移植执行恶意软件静态检测模型

花天辰1, 马晓宁1, 智慧2()   

  1. 1.中国民航大学 安全科学与工程学院,天津 300300
    2.中国民航信息网络股份有限公司,北京 101300
  • 收稿日期:2024-06-24 修回日期:2024-09-12 接受日期:2024-09-13 发布日期:2024-10-08 出版日期:2025-06-10
  • 通讯作者: 智慧
  • 作者简介:花天辰(2000—),男,安徽合肥人,硕士研究生,主要研究方向:网络与信息安全、恶意软件检测
    马晓宁(1979—),男,天津人,副教授,博士,主要研究方向:民航信息化、网络空间安全
    智慧(1994—),女,河北保定人,高级工程师,硕士,主要研究方向:民航信息化。zhihui@travelsky.com.cn
  • 基金资助:
    国家自然科学基金配套基金资助项目(3122023PT10)

Portable executable malware static detection model based on shallow artificial neural network

Tianchen HUA1, Xiaoning MA1, Hui ZHI2()   

  1. 1.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China
    2.TravelSky Technology Limited,Beijing 101300,China
  • Received:2024-06-24 Revised:2024-09-12 Accepted:2024-09-13 Online:2024-10-08 Published:2025-06-10
  • Contact: Hui ZHI
  • About author:HUA Tianchen, born in 2000, M. S. candidate. His research interests include network and information security, malware detection.
    MA Xiaoning, born in 1979, Ph. D., associate professor. His research interests include civil aviation informatization, cyberspace security.
    ZHI Hui, born in 1994, M. S., senior engineer. Her research interests include civil aviation informatization.
  • Supported by:
    Matching Fund of National Natural Science Foundation of China(3122023PT10)

摘要:

针对基于深度学习的可移植执行(PE)恶意软件检测方法中,数据集存在的不平衡或不完整问题,以及神经网络结构过深或特征集庞大而导致的模型计算资源开销和耗时增加问题,提出一种基于浅层人工神经网络(SANN)的PE恶意软件静态检测模型。首先,利用LIEF(Library to Instrument Executable Formats)库创建PE特征提取器从EMBER数据集中提取PE文件样本,并提出一种特征组合,该特征集具备更少的PE文件特征,从而在减小特征空间和模型参数量的同时能够提高深度学习模型的性能;其次,生成特征向量,通过数据清洗去除未标记的样本;再次,对特征集内的不同特征值进行归一化处理;最后,将特征向量输入SANN中进行训练和测试。实验结果表明,SANN可达到95.64%的召回率和95.24%的准确率,相较于MalConv模型和LightGBM模型,SANN的准确率分别提高了1.19和1.57个百分点。SANN的总工作耗时约为用时最少的对比模型LightGBM的1/2。此外,SANN在面对未知攻击时具备较好的弹性,且仍能够保持较高的检测水平。

关键词: 恶意软件, 静态检测, 深度学习, 浅层人工神经网络, 可移植执行文件

Abstract:

In order to address the imbalance or incompleteness issues of the datasets in Portable Executable (PE) malware detection methods based on deep learning, as well as the problem of increase of model computing resource overhead and time-consuming caused by too deep neural network structure or large feature sets, a PE malware static detection model based on Shallow Artificial Neural Network (SANN) was proposed. Firstly, LIEF(Library to Instrument Executable Formats) library was used to create a PE feature extractor to extract PE file samples from EMBER dataset, and a feature combination was proposed. In this feature set, there were fewer PE features, thereby reducing the feature space and parameters while improving performance of the deep learning model. Secondly, after generating feature vectors, the unlabeled samples were removed through data cleaning. Thirdly, different feature values in the feature set were normalized. Finally, the feature vectors were input into SANN for training and testing. Experimental results show that SANN can achieve a recall of 95.64% and an accuracy of 95.24%. Compared to the MalConv model and LightGBM model, the accuracy of SANN has increased by 1.19 and 1.57 percentage points, respectively. The total working time of SANN is about half of the comparison model LightGBM that takes the least time. Besides, facing unknown attacks, SANN is flexible and can still maintain a high level of detection.

Key words: malware, static detection, deep learning, Shallow Artificial Neural Network (SANN), Portable Executable (PE) file

中图分类号: