基于VGGNet的恶意代码变种分类

doi:10.11772/j.issn.1001-9081.2019050953

计算机应用 ›› 2020, Vol. 40 ›› Issue (1): 162-167.DOI: 10.11772/j.issn.1001-9081.2019050953

基于VGGNet的恶意代码变种分类

王博^1,2, 蔡弘昊³, 苏旸^1,2

1. 武警工程大学密码工程学院, 西安 710086;
2. 网络与信息安全武警部队重点实验室(武警工程大学), 西安 710086;
3. 武警工程大学信息工程学院, 西安 710086

收稿日期:2019-06-06 修回日期:2019-08-14 发布日期:2020-01-17 出版日期:2020-01-10
作者简介:王博(1996-),男,广东惠州人,硕士研究生,主要研究方向:恶意代码检测、深度学习;蔡弘昊(1996-),男,浙江杭州人,硕士研究生,主要研究方向:信道编码、深度学习;苏旸(1975-),男,陕西西安人,教授,博士,主要研究方向:网络安全、信息对抗。

Classification of malicious code variants based on VGGNet

WANG Bo^1,2, CAI Honghao³, SU Yang^1,2

1. College of Cryptographic Engineering, Engineering College of PAP, Xi'an Shaanxi 710086, China;
2. Key Laboratory of Network and Information Security under the Armed Police Force(Engineering College of PAP), Xi'an Shaanxi 710086, China;
3. College of Information Engineering, Engineering College of PAP, Xi'an Shaanxi 710086, China

Received:2019-06-06 Revised:2019-08-14 Online:2020-01-17 Published:2020-01-10
Contact: 苏旸

摘要/Abstract

摘要： 针对代码复用在同一恶意家族样本中普遍存在的现象，提出了一种利用代码复用特征的恶意样本分类方法。首先将文件的二进制序列分割成RGB三色通道的值，从而将恶意样本转换为彩色图；然后用这些图片基于VGG卷积神经网络生成恶意样本分类模型；最后在模型训练阶段利用随机失活算法解决过拟合和梯度消失问题以及降低神经网络计算开销。该方法使用Malimg数据集25个族的9342个样本进行评估，平均分类准确率达96.16%，能有效地分类恶意代码样本。实验结果表明，与灰度图相比，所提方法将二进制文件转换为彩色图能更明显地强调图像特征，尤其是对于二进制序列中含有重复短数据片段的文件，而且利用特征更明显的训练集，神经网络能生成分类效果更好的分类模型。所提方法预处理操作简单，分类结果响应较快，因此适用于大规模恶意样本的快速分类等即时性要求较高的场景。

关键词: 恶意代码分类, 数据可视化, 深度学习, 随机失活, 卷积神经网络

Abstract: Aiming at the phenomenon that code reuse is common in the same malicious code family, a malicious sample classification method using code reuse features was proposed. Firstly, the binary sequence of file was split into the values of RGB three-color channels, converting malicious samples into color images. Then, these images were used to generate a malicious sample classification model based on VGG convolutional neural network. Finally, during training process of model, to solve the problems of overfitting and gradient vanishing as well as high computation overhead, the random dropout algorithm was utilized. This method achieves 96.16% average classification accuracy on the 9342 samples from 25 families in Malimg dataset and can effectively classify the malicious code samples. Experimental results show that compared with grayscale images, converting binary files into color images can emphasize the image features more significantly, especially for the files with repetitive short data segments in binary sequences. And, using a training set with more obvious features, neural networks can generate a classification model with better performance. Since the preprocessing operation is simple and the classification result response is fast, the method is suitable for the scene with high real-time requirements such as rapid classification of large-scale malicious samples.

Key words: malicious code classification, data visualization, deep learning, dropout, Convolutional Neural Network (CNN)

中图分类号:

TP309

王博, 蔡弘昊, 苏旸. 基于VGGNet的恶意代码变种分类[J]. 计算机应用, 2020, 40(1): 162-167.

WANG Bo, CAI Honghao, SU Yang. Classification of malicious code variants based on VGGNet[J]. Journal of Computer Applications, 2020, 40(1): 162-167.

参考文献

[1] Symantec. Internet security threat report[EB/OL].[2017-04-17].https://pages.cobweb.com/acton/ct/15730/s-02c4-1705/Bct/l-0170/l-0170:11/ct25_1/1?sid=TV2%3AxBhBdhisn.
[2] ANDERSON B, LANE T, HASH C. Malware phylogenetics based on the multiview graphical lasso[C]//Proceedings of the 2014 International Symposium on Intelligent Data Analysis, LNCS 8819. Cham:Springer, 2014:1-12.
[3] ALAZAB M. Profiling and classifying the behavior of malicious codes[J]. Journal of Systems and Software, 2015, 100:91-102.
[4] YOO I. Visualizing windows executable viruses using self-organizing maps[C]//Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security. New York:ACM, 2004:82-89.
[5] HAN K S, LIM J H, KANG B, et al. Malware analysis using visualized images and entropy graphs[J]. International Journal of Information Security, 2015,14(1):1-14.
[6] 任卓君,陈光. 熵可视化方法在恶意代码分类中的应用[J]. 计算机工程, 2017, 43(9):167-171. (REN Z J, CHEN G. Application of entropy visualization method in malware classification[J]. Computer Engineering, 2017, 43(9):167-171.)
[7] NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images:visualization and automatic classification[C]//Proceedings of the 8th International Symposium on Visualization for Cyber Security. New York:ACM, 2011:No.4.
[8] CUI Z, XUE F, CAI X, et al. Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 2018,14(7):3187-3196.
[9] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2015-04-10].https://arxiv.org/pdf/1409.1556.pdf.
[10] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2017-01-30].https://arxiv.org/pdf/1412.6980.pdf.
[11] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[EB/OL].[2012-07-03].https://arxiv.org/pdf/1207.0580v1.pdf.
[12] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15:1929-1958.
[13] TIELEMAN T, HINTON G. Lecture 6.5-rmsprop:divide the gradient by a running average of its recent magnitude[J]. Neural Networks for Machine Learning, 2012, 4:26-30.
[14] PARK H, AMARI S I, FUKUMIZU K. Adaptive natural gradient learning algorithms for various stochastic models[J]. Neural Networks, 2000, 13(7):755-764.
[15] PAPA G, BIANCHI P, CLÉMENÇON S. Adaptive sampling for incremental optimization using stochastic gradient descent[C]//Proceedings of the 2015 International Conference on Algorithmic Learning Theory, LNCS 9355. Cham:Springer, 2015:317-331.

基于VGGNet的恶意代码变种分类

Classification of malicious code variants based on VGGNet

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[5]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[6]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[7]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[8]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[9]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[10]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[11]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[12]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[13]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[14]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[15]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.