基于Bagging-SVM的Android恶意软件检测模型

doi:10.11772/j.issn.1001-9081.2017082143

计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 818-823.DOI: 10.11772/j.issn.1001-9081.2017082143

基于Bagging-SVM的Android恶意软件检测模型

谢丽霞, 李爽

中国民航大学计算机科学与技术学院, 天津 300300

收稿日期:2017-09-04 修回日期:2017-11-12 出版日期:2018-03-10 发布日期:2018-03-07
通讯作者: 谢丽霞
作者简介:谢丽霞(1974-),女,重庆人,副教授,硕士,CCF会员,主要研究方向:网络与信息安全;李爽(1990-),男,河南南阳人,硕士研究生,主要研究方向:网络与信息安全。
基金资助:
中国民航科技基金资助项目（MHRD201205）。

Android malware detection model based on Bagging-SVM

XIE Lixia, LI Shuang

School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China

Received:2017-09-04 Revised:2017-11-12 Online:2018-03-10 Published:2018-03-07
Supported by:
This work is partially supported by the Science and Technology Foundation of Civil Aviation University of China (MHRD201205).

摘要/Abstract

摘要： 针对Android恶意软件检测中数据不平衡导致检出率低的问题，提出一种基于Bagging-SVM（支持向量机）集成算法的Android恶意软件检测模型。首先，提取AndroidManifest.xml文件中的权限信息、意图信息和组件信息作为特征；然后，提出IG-ReliefF混合筛选算法用于数据集降维，采用bootstrap抽样构造多个平衡数据集；最后，采用平衡数据集训练基于Bagging算法的SVM集成分类器，通过该分类器完成Android恶意软件检测。在分类检测实验中，当良性样本和恶意样本数量平衡时，Bagging-SVM和随机森林算法检出率均高达99.4%；当良性样本和恶意样本的数量比为4：1时，相比随机森林和AdaBoost算法，Bagging-SVM算法在检测精度不降低的条件下，检出率提高了6.6%。实验结果表明所提模型在数据不平衡时仍具有较高的检出率和分类精度，可检测出绝大多数恶意软件。

关键词: 恶意软件, 分类检测, Bagging算法, 支持向量机, 特征筛选

Abstract: Aiming at the low detection rate caused by data imbalance in Android malware detection, an Android malware detection model based on Bagging-SVM (Support Vector Machine) integrated algorithm was proposed. Firstly, the permission information, intent information and component information were extracted as features from the file AndroidManifest.xml. Secondly, IG-ReliefF hybrid selection algorithm was proposed to reduce the dimension of data sets, and multiple balanced data sets were formed by bootstrap sampling method. Finally, a Bagging-based SVM ensemble classifier was trained by the multiple balanced data sets to detect Android malware. In the classification experiment, the detection rates of Bagging-SVM and random forest algorithm were 99.4% when the number of benign and malicious samples was balanced. When the ratio of benign and malicious samples was 4:1, the detection rate of Bagging-SVM algorithm was 6.6% higher than random forest algorithm and AdaBoost algorithm without reducing the detection accuracy. The experiment results show that the proposed model still has high detection rate and classification accuracy and can detect the vast majority of malware in the case of data imbalance.

Key words: malware, classification detection, Bagging algorithm, Support Vector Machine (SVM), feature selection

中图分类号:

TP309

谢丽霞, 李爽. 基于Bagging-SVM的Android恶意软件检测模型[J]. 计算机应用, 2018, 38(3): 818-823.

XIE Lixia, LI Shuang. Android malware detection model based on Bagging-SVM[J]. Journal of Computer Applications, 2018, 38(3): 818-823.

参考文献

[1] 卿斯汉. Android安全研究进展[J]. 软件学报, 2016, 27(1):45-71.(QING S H. Research progress on Android security[J]. Journal of Software, 2016, 27(1):45-71.)
[2] 张怡婷, 张扬, 张涛, 等. 基于朴素贝叶斯的Android软件恶意行为智能识别[J]. 东南大学学报(自然科学版), 2015, 45(2):224-230.(ZHANG Y T, ZHANG Y, ZHANG T, et al. Intelligent identification of malicious behavior in Android applications based on naive Bayes[J]. Journal of Southeast University (Natural Science Edition), 2015, 45(2):224-230.)
[3] 杨欢, 张玉清, 胡予濮, 等. 基于多类特征的Android应用恶意行为检测系统[J]. 计算机学报, 2014, 37(1):15-27. (YANG H, ZHANG Y Q, HU Y P, et al. A malware behavior detection system of Android applications based on multi-class features[J]. Chinese Journal of Computers, 2014, 37(1):15-27.)
[4] WOLFE B, ELISH K, YAO D. High precision screening for Android malware with dimensionality reduction[C]//ICMLA 2014:Proceedings of the 201413th International Conference on Machine Learning and Applications. Piscataway, NJ:IEEE, 2015:21-28.
[5] ARORA A, PEDDOJU S K. Minimizing network traffic features for Android mobile malware detection[C]//ICDCN'17:Proceedings of the 18th International Conference on Distributed Computing and Networking. New York:ACM, 2017:Article No. 32.
[6] 杨宏宇,徐晋.基于改进随机森林算法的Android恶意软件检测[J].通信学报,2017,38(4):8-16.(YANG H Y, XU J. Android malware detection based on improved random forest[J]. Journal on Communications, 2017, 38(4):8-16.)
[7] 乔静静.Android未知恶意软件检测方法的研究[D].北京:北京工业大学,2013:39-48.(QIAO J J. Research of unknown malware detection on Android[D]. Beijing:Beijing University of Technology, 2013:39-48.)
[8] 张巍,任环,张凯,等.基于移动软件行为大数据挖掘的恶意软件检测技术[J].集成技术,2016,5(2):29-40.(ZHANG W, REN H, ZHANG K, et al. Malware detection techniques by mining massive behavioral data of mobile Apps[J]. Journal of Integration Technology, 2016, 5(2):29-40.)
[9] FEIZOLLAH A, ANUAR N B, SALLEH R, et al. A review on feature selection in mobile malware detection[J]. Digital Investigation, 2015, 13:22-37.
[10] LUO Y X. Malicious detection based on ReliefF and boosting multidimensional features[J]. Journal of Communications, 2015, 10(11):910-917.
[11] 丰生强.Android软件安全与逆向分析[M]. 北京:人民邮电出版社,2013:20-28.(FENG S Q. Android Software Security and Reverse Analysis[M]. Beijing:Post & Telecom Press, 2013:20-28.)
[12] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 21(9):1263-1284.
[13] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140.
[14] JIANG X, ZHOU Y. Dissecting Android malware:characterization and evolution[C]//SP'12:Proceedings of the 2012 IEEE Symposium on Security and Privacy. Washington, DC:IEEE Computer Society, 2012:95-109.
[15] 袁梅宇.数据挖掘与机器学习:WEKA应用技术与实践[M].北京:清华大学出版社,2016:329-344.(YUAN M Y. Data Mining and Machine Learning:WEKA Application Technology and Practice[M]. Beijing:Tsinghua University Press, 2016:329-344.)
[16] PEDREGOSA F, GRAMFORT A, MICHEL V, et al. Scikit-learn:machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12(10):2825-2830.

基于Bagging-SVM的Android恶意软件检测模型

Android malware detection model based on Bagging-SVM

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	贾鹤鸣, 姜子超, 李瑶, 孙康健. 基于改进斑点鬣狗优化算法的同步优化特征选择[J]. 计算机应用, 2021, 41(5): 1290-1298.
[2]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[3]	李凯, 李洁. 基于pinball损失的结构模糊多分类支持向量机算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3104-3112.
[4]	童林, 官铮. 改进鲸鱼优化支持向量机的交通流量模糊粒化预测[J]. 计算机应用, 2021, 41(10): 2919-2927.
[5]	陆荣秀, 陈明明, 杨辉, 朱建勇. 基于溶液图像时序特征的元素组分含量动态监测系统[J]. 计算机应用, 2021, 41(10): 3075-3081.
[6]	张健铭, 施元昊, 徐正蓺, 魏建明. 基于误差预测的自适应UWB/PDR融合定位算法[J]. 计算机应用, 2020, 40(6): 1755-1762.
[7]	黄功, 赵永平, 谢云龙. 基于局部密度的加权一类支持向量机算法及其在涡轴发动机故障检测中的应用[J]. 计算机应用, 2020, 40(3): 917-924.
[8]	王杨, 赵红东. 基于改进粒子群优化的支持向量机与情景感知的人体活动识别[J]. 计算机应用, 2020, 40(3): 665-671.
[9]	赵一, 段兴, 谢仕义, 梁春林. 面向特定目标自识别的交通图像语义检索方法[J]. 计算机应用, 2020, 40(2): 553-560.
[10]	李卉, 杨志霞. 基于Rescaled Hinge损失函数的多子支持向量机[J]. 计算机应用, 2020, 40(11): 3139-3145.
[11]	牛晓可, 黄伊鑫, 徐华兴, 蒋震阳. 基于听皮层神经元感受野的强噪声环境下说话人识别[J]. 计算机应用, 2020, 40(10): 3034-3040.
[12]	白东颖, 易亚星, 王庆超, 余志勇. 面向概念漂移问题的渐进多核学习方法[J]. 计算机应用, 2019, 39(9): 2494-2498.
[13]	何海琳, 郑建彬, 余方利, 余烈, 詹恩奇. 基于改进鲸鱼优化算法的外骨骼机器人步态检测[J]. 计算机应用, 2019, 39(7): 1905-1911.
[14]	潘建国, 李豪. 基于实用拜占庭容错的物联网入侵检测方法[J]. 计算机应用, 2019, 39(6): 1742-1746.
[15]	孔菁, 郭渊博, 刘春辉, 王一丰. 基于智能手机运动传感器的步态特征身份识别方法[J]. 计算机应用, 2019, 39(6): 1747-1752.