Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1708-1715.DOI: 10.11772/j.issn.1001-9081.2021061410

• National Open Distributed and Parallel Computing Conference 2021 (DPCS 2021) • Previous Articles    

Malicious code detection method based on attention mechanism and residual network

Yang ZHANG(), Jiangbo HAO   

  1. School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang Hebei 050018,China
  • Received:2021-08-06 Revised:2021-09-10 Accepted:2021-10-20 Online:2022-01-10 Published:2022-06-10
  • Contact: Yang ZHANG
  • About author:HAO Jiangbo,born in 1996,M. S. candidate. His research interests include intelligent software analysis.
  • Supported by:
    National Natural Science Foundation of China(61440012);Key Basic Research Project of Hebei Fundamental Research Plan(189601106D);Key Project of Higher Education Research Program of Hebei Province(ZD2019093)

基于注意力机制和残差网络的恶意代码检测方法

张杨(), 郝江波   

  1. 河北科技大学 信息科学与工程学院,石家庄 050018
  • 通讯作者: 张杨
  • 作者简介:郝江波(1996—),男,河北邢台人,硕士研究生,主要研究方向:智能软件分析。
  • 基金资助:
    国家自然科学基金资助项目(61440012);河北省基础研究计划重点基础研究专项(18960106D);河北省教育厅高等学校科学研究计划重点项目(ZD2019093)

Abstract:

As the existing malicious code detection methods based on deep learning have problems of insufficiency and low accuracy of feature extraction, a malicious code detection method based on attention mechanism and Residual Network (ResNet) called ARMD was proposed. To support the training of this method, the hash values of 47 580 malicious and benign codes were obtained from Kaggle website, and the APIs called by each code were extracted by analysis tool VirusTotal. After that, the called APIs were integrated into 1 000 non-repeated APIs as the detection features, and the training sample data was constructed through these features. Then, the sample data was labeled by determining the benignity and maliciousness based on the VirusTotal analysis results, and the SMOTE (Synthetic Minority Over-sampling Technique) enhancement algorithm was used to equalize the data samples. Finally, the ResNet injecting with the attention mechanism was built and trained to complete the malicious code detection. Experimental results show that the accuracy of malicious code detection of ARMD is 97.76%, and compared with the existing detection methods based on Convolutional Neural Network (CNN) and ResNet models, ARMD has the average precision improved by at least 2%, verifying the effectiveness of ARMD.

Key words: deep learning, malicious code, attention mechanism, Residual Network (ResNet), SMOTE (Synthetic Minority Over-sampling Technique)

摘要:

针对目前已有的基于深度学习的恶意代码检测方法提取特征不足和准确率低的问题,提出一种基于注意力机制和残差网络(ResNet)的恶意代码检测方法ARMD。为了支持该方法的训练,从Kaggle网站获取了47 580个恶意和良性代码的Hash值,并利用VirusTotal分析工具提取每个代码数据调用的API,在此之后将所调用的API整合为1 000个不重复的API作为检测的特征来构造训练样本数据;然后根据VirusTotal的分析结果进行良恶性判定进而标记样本数据,并采用SMOTE增强算法使数据样本均衡化;最后构建并训练注入注意力机制的ResNet,从而实现恶意代码检测。实验结果表明ARMD的恶意代码检测准确率为97.76%,且与目前已有的基于卷积神经网络(CNN)和ResNet模型的检测方法相比,平均精确率至少提高了2个百分点,验证了ARMD的有效性。

关键词: 深度学习, 恶意代码, 注意力机制, 残差网络, SMOTE

CLC Number: