Malicious file detection method based on image texture and convolutional neural network

doi:10.11772/j.issn.1001-9081.2018030691

Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (10): 2929-2933.DOI: 10.11772/j.issn.1001-9081.2018030691

Previous Articles Next Articles

Malicious file detection method based on image texture and convolutional neural network

JIANG Chen, HU Yupeng, SI Kai, KUANG Wenxin

College of Information Science and Engineering, Hunan University, Changsha 410000, China

Received:2018-04-04 Revised:2018-05-10 Online:2018-10-10 Published:2018-10-13
Supported by:
This work is partially supported by the Huxiang Youth Talent Plan (2017RS3018).

基于图像纹理和卷积神经网络的恶意文件检测方法

蒋晨, 胡玉鹏, 司凯, 旷文鑫

湖南大学信息科学与工程学院, 长沙 410000

通讯作者: 胡玉鹏
作者简介:蒋晨(1992-),女,安徽滁州人,硕士研究生,主要研究方向:Android安全、分布式云存储;胡玉鹏(1981-),男,湖南衡阳人,副教授,博士,主要研究方向:云存储安全与可靠性、分布式云存储、Android安全;司凯(1992-),男,河南周口人,硕士研究生,主要研究方向:自然语言处理、机器学习、深度学习;旷文鑫(1993-),女,湖南衡阳人,博士研究生,主要研究方向:社交网络、云计算、分布式存储。
基金资助:
湖湘青年英才计划项目（2017RS3018）。

Abstract

Abstract: In big data environment, traditional malicious file detection methods have low detection accuracy for malicious files after code variant and confusion, and weak versatility of cross-platform malicious files. To resolve these problems, a malicious file detection method based on image texture and Convolutional Neural Network (CNN) was proposed. Firstly, a grayscale image generation algorithm was used to convert the executable files on Android and Windows platforms, namely.dex and.exe files, into corresponding grayscale images. Then, the texture features of these grayscale images were automatically extracted and learned by using CNN algorithm, to construct a malicious file detection model. Finally, a large number of unknown files were used to test the accuracy of the proposed model. The experimental results on a large number of malicious samples showed that the highest accuracy of the proposed model on Android platform and Windows platform reached 79.6% and 97.6%, and the average accuracy were approximately 79.3% and 96.8%, respectively. Compared with the texture fingerprint-based malicious code detection method, the accuracy of the proposed method was improved by about 20%. Experimenatal results indicate that the proposed method can effectively avoid the problems caused by manual screening features, greatly improve the detection accuracy and efficiency, successfully solve the cross-platform detection problem, and achieve an end-to-end malicious file detection model.

Key words: big data, malicious file detection, deep learning, grayscale image, Convolutional Neural Network (CNN)

摘要： 在大数据环境下，针对传统恶意文件检测方法对经过代码变种和混淆后的恶意文件检测准确率低以及对跨平台恶意文件检测通用性弱等问题，提出一种基于图像纹理和卷积神经网络的恶意文件检测方法。首先，使用灰度图像生成算法将Android和Windows平台下可执行文件，即.dex和.exe文件，转换成相应的灰度图像；然后，通过卷积神经网络（CNN）算法自动提取这些灰度图像的纹理特征并加以学习训练，从而构建出一个恶意文件检测模型；最后，使用大量未知待检测的文件去验证模型检测准确率的高低。通过对大量的恶意样本进行实验，在Android和Windows平台下，模型检测最高准确率分别达到79.6%和97.6%，平均准确率分别约为79.3%和96.8%；与基于纹理指纹的恶意代码变种检测方法相比，基于图像纹理和卷积神经网络的恶意文件检测方法准确率提高了约20%。实验结果表明，所提方法能够有效避免人工筛选特征带来的问题，大幅提高检测的准确率和效率，成功解决跨平台检测问题，实现了一种端到端的恶意文件检测模型。

关键词: 大数据, 恶意文件检测, 深度学习, 灰度图像, 卷积神经网络

CLC Number:

TP309.5

JIANG Chen, HU Yupeng, SI Kai, KUANG Wenxin. Malicious file detection method based on image texture and convolutional neural network[J]. Journal of Computer Applications, 2018, 38(10): 2929-2933.

蒋晨, 胡玉鹏, 司凯, 旷文鑫. 基于图像纹理和卷积神经网络的恶意文件检测方法[J]. 计算机应用, 2018, 38(10): 2929-2933.

References

[1] ZHU J W, WU Z G, GUAN Z, et al. API sequences based malware detection for Android[C]//Proceedings of the 2015 IEEE 12th International Conference on Ubiquitous Intelligence and Computing. Piscataway, NJ:IEEE, 2016:673-676.
[2] ZHANG F Y, ZHAO T Z. Malware detection and classification based on n-grams attribute similarity[C]//Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering and IEEE International Conference on Embedded and Ubiquitous Computing, 2017:793-796.
[3] TANG W, JIN G, HE J, et al. Extending Android security enforcement with a security distance model[C]//Proceedings of the 2011 International Conference on Internet Technology and Applications. Piscataway, NJ:IEEE, 2011:1-4.
[4] 李寅, 范明钰, 王光卫. 基于反编译的Android平台恶意代码静态分析[J]. 计算机系统应用, 2012, 21(11):187-189. (LI Y, FAN M Y, WANG G W. Decompilation-based static analysis for malware on Android platform[J]. Computer Systems and Applications, 2012, 21(11):187-189.)
[5] 毛蔚轩, 蔡忠闽, 童力. 一种基于主动学习的恶意代码检测方法[J]. 软件学报, 2017, 28(2):384-397. (MAO Y X, CAI Z M, TONG L. Malware detection method based on active learning[J]. Journal of Software, 2017, 28(2):384-397.)
[6] 林鑫.基于沙盒的Android恶意软件检测技术研究[J]. 电子设计工程, 2016, 24(12):48-50, 53. (LIN X. Malware detection technology research of Android platform based on sandbox[J]. Electronic Design Engineering, 2016, 24(12):48-50, 53.)
[7] YEWALE A, SINGH M. Malware detection based on opcode frequency[C]//Proceedings of the 2016 International Conference on Advanced Communication Control and Computing Technologies. Piscataway, NJ:IEEE, 2016:646-649.
[8] ENCK W,GILBERT P, HAN S, et al. TaintDroid:an information-flow tracking system for realtime privacy monitoring on smartphones[J]. ACM Transactions on Computer Systems, 2014, 32(2):Article No. 5.
[9] 韩晓光, 曲武, 姚宣霞, 等. 基于纹理指纹的恶意代码变种检测方法研究[J]. 通信学报, 2014, 35(8):125-136. (HAN X G, QU W, YAO X X, et al. Research on malicious code variants detection based on texture fingerprint[J]. Journal on Communications, 2014, 35(8):125-136.)
[10] YANN L C, KAVUKCUOGLU K, FARABET C. Convolutional networks and applications in vision[C]//Proceedings of the 2010 IEEE International Symposium on Circuits and Systems. Piscataway, NJ:IEEE, 2010:253-256.
[11] LIU W Y, WEN Y D, YU Z D, et al. Large-margin softmax loss for convolutional neural networks[C]//ICML 2016:Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York:[s.n.], 2016:507-516.
[12] WONGSUPHASAWAT K, SMILKOV D, WEXLlER J, et al. Visualizing dataflow graphs of deep learning models in TensorFlow[J]. IEEE Transactions on Visualization and Computer Graphics, 2018, 24(1):1-12.

[1]	CHEN Chengrui, SUN Ning, HE Shibiao, LIAO Yong. Deep learning-based joint channel estimation and equalization algorithm for C-V2X communications [J]. Journal of Computer Applications, 2021, 41(9): 2687-2693.
[2]	WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747.
[3]	ZHENG Zhiqiang, HU Xin, WENG Zhi, WANG Yuhe, CHENG Xi. Cattle eye image feature extraction method based on improved DenseNet [J]. Journal of Computer Applications, 2021, 41(9): 2780-2784.
[4]	XIE Defeng, JI Jianmin. Syntax-enhanced semantic parsing with syntax-aware representation [J]. Journal of Computer Applications, 2021, 41(9): 2489-2495.
[5]	DAI Yurou, YANG Qing, ZHANG Fengli, ZHOU Fan. Trajectory prediction model of social network users based on self-supervised learning [J]. Journal of Computer Applications, 2021, 41(9): 2545-2551.
[6]	SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735.
[7]	LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509.
[8]	ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614.
[9]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[10]	XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725.
[11]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[12]	QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337.
[13]	HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.
[14]	HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao. Case reading comprehension method combining syntactic guidance and character attention mechanism [J]. Journal of Computer Applications, 2021, 41(8): 2427-2431.
[15]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.

Malicious file detection method based on image texture and convolutional neural network

基于图像纹理和卷积神经网络的恶意文件检测方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics