计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 818-823.DOI: 10.11772/j.issn.1001-9081.2017082143

• 计算机软件技术 • 上一篇    下一篇

基于Bagging-SVM的Android恶意软件检测模型

谢丽霞, 李爽   

  1. 中国民航大学 计算机科学与技术学院, 天津 300300
  • 收稿日期:2017-09-04 修回日期:2017-11-12 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 谢丽霞
  • 作者简介:谢丽霞(1974-),女,重庆人,副教授,硕士,CCF会员,主要研究方向:网络与信息安全;李爽(1990-),男,河南南阳人,硕士研究生,主要研究方向:网络与信息安全。
  • 基金资助:
    中国民航科技基金资助项目(MHRD201205)。

Android malware detection model based on Bagging-SVM

XIE Lixia, LI Shuang   

  1. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2017-09-04 Revised:2017-11-12 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by the Science and Technology Foundation of Civil Aviation University of China (MHRD201205).

摘要: 针对Android恶意软件检测中数据不平衡导致检出率低的问题,提出一种基于Bagging-SVM(支持向量机)集成算法的Android恶意软件检测模型。首先,提取AndroidManifest.xml文件中的权限信息、意图信息和组件信息作为特征;然后,提出IG-ReliefF混合筛选算法用于数据集降维,采用bootstrap抽样构造多个平衡数据集;最后,采用平衡数据集训练基于Bagging算法的SVM集成分类器,通过该分类器完成Android恶意软件检测。在分类检测实验中,当良性样本和恶意样本数量平衡时,Bagging-SVM和随机森林算法检出率均高达99.4%;当良性样本和恶意样本的数量比为4:1时,相比随机森林和AdaBoost算法,Bagging-SVM算法在检测精度不降低的条件下,检出率提高了6.6%。实验结果表明所提模型在数据不平衡时仍具有较高的检出率和分类精度,可检测出绝大多数恶意软件。

关键词: 恶意软件, 分类检测, Bagging算法, 支持向量机, 特征筛选

Abstract: Aiming at the low detection rate caused by data imbalance in Android malware detection, an Android malware detection model based on Bagging-SVM (Support Vector Machine) integrated algorithm was proposed. Firstly, the permission information, intent information and component information were extracted as features from the file AndroidManifest.xml. Secondly, IG-ReliefF hybrid selection algorithm was proposed to reduce the dimension of data sets, and multiple balanced data sets were formed by bootstrap sampling method. Finally, a Bagging-based SVM ensemble classifier was trained by the multiple balanced data sets to detect Android malware. In the classification experiment, the detection rates of Bagging-SVM and random forest algorithm were 99.4% when the number of benign and malicious samples was balanced. When the ratio of benign and malicious samples was 4:1, the detection rate of Bagging-SVM algorithm was 6.6% higher than random forest algorithm and AdaBoost algorithm without reducing the detection accuracy. The experiment results show that the proposed model still has high detection rate and classification accuracy and can detect the vast majority of malware in the case of data imbalance.

Key words: malware, classification detection, Bagging algorithm, Support Vector Machine (SVM), feature selection

中图分类号: