Journal of Computer Applications ›› 2012, Vol. 32 ›› Issue (10): 2761-2767.DOI: 10.3724/SP.J.1087.2012.02761

• Information security • Previous Articles     Next Articles

Hierarchical feature selection method for detection of obfuscated malicious code

ZHANG Jian-fei,CHEN Li-fei,GUO Gong-de   

  1. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350007, China
  • Received:2012-04-09 Revised:2012-06-07 Online:2012-10-23 Published:2012-10-01
  • Contact: ZHANG Jian-fei



  1. 福建师范大学 数学与计算机科学学院,福州 350007
  • 通讯作者: 张健飞
  • 作者简介:张健飞(1988-),男,安徽合肥人,硕士研究生,CCF会员,主要研究方向:数据挖掘;陈黎飞(1972-),男,福建长乐人,副教授,博士,主要研究方向:数据挖掘、模式识别;郭躬德(1965-),男,福建龙岩人,教授,博士,主要研究方向:人工智能、数据挖掘。
  • 基金资助:

Abstract: Obfuscated malicious codes can easily escape from being detected by the conventional static method. On the other hand, despite its high detection accuracy, the dynamic method usually expends a large amount of system resources. A hierarchical feature selection method was proposed to improve the detection accuracy with relative low system overhead, where the features were generated and subsequently selected on the oriented layer, the individual layer, the family layer and the global layer, respectively. By the layer-by-layer refinements, an appropriate trade-off between the feature redundancy and information omission was archived using the hierarchical feature selection method. The experimental results on the real-world datasets demonstrate that the proposed method yields high accuracy for detecting obfuscated malicious code, while has several advantages such as smaller size of required training samples and better generalization ability compared with the conventional feature selection methods.

Key words: malicious code detection, obfuscated malicious code, feature selection, hierarchical method, code family

摘要: 各种迷惑恶意代码能够轻易躲避传统静态检测,而动态检测方式虽有较好的检测率,却消耗大量系统资源。为提高低系统开销下迷惑恶意代码的检测率,提出一种层次化特征选择方法,依次在引导层、个体层、家族层和全局层上生成并选择特征。层次方法以逐层精化特征的方式寻求特征冗余和信息漏选之间的平衡。实际数据集上的实验结果表明所提方法的迷惑恶意代码检测率较高,与传统特征选择方法相比,具有所需训练样本集小、泛化能力强的优点。

关键词: 恶意代码检测, 迷惑恶意代码, 特征选择, 层次方法, 代码家族

CLC Number: